The catch with O(n)
Understanding the "big-O" of algorithms and processes is extremely useful, in my opinion. It seems that to some people it sounds very theoretical and not very practical, but I constantly think about it basically any time I process data. I don't take out pen and paper to prove the runtime or whatever, but I quickly think in my head if this is O(n) or O(n^2) or O(2^n).
If you're being reasonable, that is. But "big-O" doesn't have the universality that perhaps a mathematician craves (even though it's 100% mathematical!). Basically, "big-O" by itself doesn't spit out whether an algorithm is good or bad, and you can make it totally useless with, let's say, "adversarial" input.
I'll start with in a reasonable and hopefully useful, but we'll end up down in the not-really-reasonable and the no-further-than-technically correct.
Definition of "big-O"
"Big-O" is just a way of notating runtime complexity, which is what I will say from now on. Runtime complexity measures how well an algorithm scales when the size of some input increases. If you do a for loop on n items, and then the same loop on 2n items, runtime will about double. However, if you have a nested for loop (let's say if you're doing something with each pair of items in a list) with n items, then the case for 2n items will quadruple. In a more general nested loop case, if you start with n items and then scale to a*n items, the runtime will get multiplied by a factor of a^2. Therefore, the runtime complexity is quadratic, or O(n^2).
The first time I heard about runtime complexity I thought it was an approximate handwavy thingy and, while it is in some sense approximate, it has a rigorous mathematical definition. Namely:
A function f is O(g) if there exists an x_0 and a positive a, such that for every x greater than x_0, f(x) < a * g(x).
Or, more compactly:
And, in English, it says that f is O(g) if eventually f can't "overtake" any scaled version of g.
The a there should make you suspicious. We can "choose" a to any positive number, so how is it possible that f(x) < a * g(x) could be false? We can make a = 100 a = 99999999999, or whatever we want! Well, the catch is that if f grows faster than g, then it's always going to overtake it at some point, and keep overtaking it.
To make it more concrete let's take f(x) = x^2 and g(x) = x so a * g(x) = a * x. It doesn't matter how "steep" we make a*x, because x^2 is going to overtake at some point. In fact, we can calculate exactly when that is!
Let's first find where they are the same:
x^2 = a*x then x = a or x = 0
This example works out nicely because of the few constants, but it says that, no matter what a is, f is going to reach x * g at precisely the point a. At that point x^2 just grows faster.
So what does runtime complexity tell us, really?
If a function, such as the runtime of a program, is O(g), it means that it never grows faster than g. No more, no less.
Implications
Now that we know what we're talking about, we can discuss and understand various properties of complexity analysis. Again, from the practical to the esoteric.
O(n) only talks about "eventually"
The definition of runtime complexity is fundamentally asymptotic, which basically means it "only really matters" at "infinity". In other words, saying that the runtime of a program is O(x^2) it means that it is, eventually, just as good/bad as regular x^21. But "eventually" is never defined, and this statement is only fully true at infinity. At every point before infinity (even though that doesn't really make sense), this statement is a bit false, but it gets closer to true as the values get bigger.
This is something that is actually good to keep in mind. The most common example is probably when you use HashSets or HashMaps (e.g., when you want to keep track of what elements you have visited and which you haven't). A HashSet (with the correct properties) gives you the optimal O(1) time to check if an element is present and add an element also in O(1). But, if the set of numbers you want to keep track is relatively small (say, 10 or 20 numbers), then it might be faster to use a regular Vec, even though it has O(n) checking.
Why? Because a HashSet has a lot of additional overhead (with allocations and bookkeeping and search in buckets and so on). This overhead remains constant2 even with increasing number of inputs, but if you never (or rarely) have those many inputs then it might not be worth it.
This graph visualizes that the Vec starts faster but breaks down with more elements. It's just for illustration purposes, the scales of the axes are completely made up.
^ / Vec's time to add one element
| /
| /
| /
| / HashSet's time to add one element
|----/--------------------------------------
| /
| /
| /
|/
|----|--------------------------------------> number of elements
Vec | <- HashSet wins ->
wins |
If this is something you didn't know, you should keep it in mind because it may come in handy pretty frequently.
O(n) can give you insight
Do you know A*? It's basically just DFS, except that it takes advantage of some notion of "distance" (called a heuristic) to a goal. In short, it searches all neighbors, calculates approximately how far each one is and then takes the closest and expands that. It's pretty neat. But it has a problem, which is that it has to keep track of all of the nodes queued for visit, which can be a lot (say, many gigabytes). What can you do instead? Use IDA*! Instead of always searching the closest node, you search up to a depth and, if you haven't found the solution, throw it all away and search with more depth!
The first time I heard about this it sounded ludicrous. "What do you mean throw all your search away? That seems so wasteful! It's going to be slow. Like, what even is the time complexity of that??"
So what is the time complexity of DFA, A* and IDA* in, for example, a tree-like structure like solving a Rubik's cube? Long story short, they're all exponential. O(exp n). This is surprising because A* seems so much more efficient than DFA and IDA* seems so much less.
The intuitive reason why A* has the same runtime complexity as DFA is because, at the end of the day, you still have to search the moves which are still very many. It would only be better if you could prove certain aspects of your heuristic which, if possible, would probably mean you can just use a different strategy altogether.
The reason that IDA* has the same runtime complexity as A* is because every time you increment the depth, the difference is so big that the previous steps are not really relevant. If you 10x the possible explorable states every time you increase the depth, then the search from 0 to n - 1 will be just 9% of the search of depth n alone. So yes, it is wasting compute and it is slower but it doesn't change, for example, how feasible a computation is3.
This is a wonderful example of why it's useful to study the complexity of algorithms. In this case, my intuitive gauge clashed with the reality of the complexity of the algorithm, and it helped me understand why I was wrong, because I was!
I say this as a kind of "rebuttal" to the sentiment that runtime complexity doesn't tell you anything and you have to "use your brain". I mean, yeah you 100% should really understand what's happening, but runtimime complexity analysis can be an aid in that regard.
When O(f) is actually O(1)
There's often the case where you have an algorithm that just screeeeeams some O(f). For example, solving a Rubik's cube, as we said before is O(exp n).
...right?
Well, technically it isn't.
Let's start off by understanding what is n in this context. When does solving a Rubik's cube become more difficult? I think the most reasonable definition is to say it's the number of moves used to scramble the cube. And this is correct! ...but only up to a certain point.
But think about this:
- There are finite possible states of the Rubik's cube (very many, but finite)
- Each state can be solved in a finite amount of moves
- Therefore, there has to be a maximum number of moves needed to solve every possible state, which we will refer to as
K4.
Where does the problem arise? Well, after K moves the complexity doesn't increase! The asymptotic behavior of the complexity is, then, just O(1).
From the theoretical point of view, it might not be outright wrong to say that solving the Rubik's cube is O(exp n), but
n <= Kso, in terms of runtime complexity- Solving the cube is
O(exp K) Kis a constant, soexp Kis a constant, soO(exp K)is just as good asO(1)
Rubik's cube solvers run in constant time, technically.
If you tried implementing a Rubik's cube solver, then it probably also does feels like O(exp n), but that's just because we get to the cases where n is "big enough to be slow" very quickly and can rarely reach the cases where n is "too big"5
However, for simpler problems then you can break out of the exponential part, so the fact that your algorithm looks like O(f(n)) but actually n is constant or has a maximum could give you an actually meaningful understanding of the problem/algorithm, even though in fairness these cases might be rare.
Most complexity analyses are wrong, lol
Your algorithm looks like
O(f(n))but actuallynis constant or has a maximum [...] in fairness, these cases might be rare
Umm... yeah actually I was lying and basically every algorithm you have ever encountered/heard of/analysed is probably wrong, in technicality.
First things first, what is the runtime complexity of adding numbers? If you think about it, it's O(log n), where n is the number of possible values. But probably you've never have actually counted it as O(log n). Why? Because the number of possible values is fixed, generally at 2^64. This happens with most other operations in the computer, so basically every analysis you've ever seen took this into account.
This is still something useful to keep in mind! Namely, in case you have a program that needs to go beyond 64 bits. In a lot of cases the runtime complexity will be the same because the size of the numbers grow more slowly than the size of the input, but that doesn't have to be the case.
But there are other things. Things like Vec. What is the runtime complexity of duplicating the vector? Technically, O(1) again! You might be able to see why now: even though it is generally O(n), in practice the computer can just... run out of memory. Then, every duplication of size after that point is constant time because an application crash is O(1).
So yeah we've reached the point where I don't believe that the things I'm going to say from this point onwards are useful at all. But it's fun to think about! For example, is there any algorithm that is not O(1)? It's hard to think of an example in practice where you won't be able to find a limit of the computer/system where the complexity just goes down to O(1). But I'm not sure there couldn't be one6.
The universally optimal algorithm
Think about this: you can push the definition of O(n) to be very non-realistic because it assumes infinite memory and resources and time. You're probably not thinking big enough. You're trying to be reasonable but... you don't have to. You have infinite memory and infinite time. What can you do?
Think about IDA*, where you can "throw" away a lot of effort and it is still have the optimal time complexity. What if you do that but for writing algorithms? Just take a problem, start writing every possible program and running it and see if it solves it! If you take this idea carefully you will have constructed an algorithm that solves every problem optimally. It will probably be agonizingly slow for any kind of non-trivial problem, but the time complexity is still optimal! There's this nice video that explains it, if you want to know more of the details.
And, look, I know I've said I was done with the useful things... but there's still something to learn here.
I already said that when I first heard about O(n) as a concept I thought it was a "human approximation". It seemed hard to have a rigorous definition for the concept. But, turns out, there was! A totally rigorous, 100% mathematical definition. I didn't really mention it before but I think it's such a clever and nice way to encapsulate this concept.
And, funnily enough, I've now gone full circle. Yes, O(n) is a mathematical tool but it is honestly not that mathematical in spirit? at least for me. Or, at least, deceivingly mathematical. One of the main points of math for me is that it is universal and totally true without exception. The results of complexity analysis are totally true and defined and whatnot once you're in the "math bubble", but it is not when you enter it and when you leave it. You have to, as a person, define what n is, what are the inputs, what we take as constant, at what point do we discard results (is the size of a number constant? depends on how big they could get). And then, you have to understand if the resulting time complexity, once everything is defined, is relevant or not. The optimal algorithm is optimal, but it's unreasonable. If your inputs don't tend to get big, it might be worse. And it might not be clear what is better or worse because it's hard to estimate how much, say, cache usage will affect results.
The only reason math is not totally useless is because of interfaces to the real world. I think the typical mathematician says that they have zero interest in making anything useful, and there's the trope that the more useful something is, the less a mathematician will like it. I relate with that sentiment, even though clearly that's not quite it. The point is that the mathy math is what happens inside the math world, and these "interfaces" to the real world are, because of necessity, also part of math, but they're not the point.
The math part of complexity analysis is relatively brief and the interfaces are pretty significant in comparison. Notice that the place we "broke" O(n) is precisely in the interfaces to the real world. And of course we did! Math is unbreakable but the real world is extremely fragile. At least in comparison. So yeah, I guess that was that one last thing I wanted to say. It is something that could be useful, but it is in fairness more abstract. Maybe that makes it more useful even!
Addendum: O(log n) is so so so much better than O(n)
O(log n) is better than O(n). You probably already knew that. But like. It's soooooooooooooo much better. Maybe you already know this but it took me way too long to realize. The quick understanding is that O(log n) is as good as O(exp n) is bad. And I think most people have O(exp n) as the one that makes problems/algorithms not feasible. O(log n) becomes better and better the bigger the input. A good example is linear vs binary search. And there is a good meme that demonstrates my point.
So in the same way that O(exp n) makes problems unfeasiable, O(log n) should be thought of as so good it's almost negligible. O(log n) is way closer to O(1) than to O(n).
And now we're actually done. Have a nice day ^^
-
Technically,
Theta(g)is the one that tells you it's just as good/bad asg, whileO(g)tells you it's not worse thang, but in practice we always take the best case reasonableO(g)which is basically whatTheta(g)is. ↩ -
amortized-constant, technically. ↩
-
The reason why IDA* can be better than A* in tree-like searches is because A* is
O(exp n)in memory complexity while IDA* is justO(n), with the same runtime complexity! In fact, in some ways, IDA* to A* is what DFS is to BFS. ↩ -
The actual number happens to be 20, but that's a very non-obvious result which is not relevant to the argument. The only thing necessary is to understand that such a limit exists, which is a lot simpler. ↩
-
In my experience I don't think it took more than 6 or 7 moves for the solving speed to grind down to a halt. ↩
-
It seems to me that there should be a limit of how much information there can exist in the universe. But I can't justify it too much. I know that fundamental particles are quantized and can't be in the same state but position seems to be continuous? So maybe there's infinite possible states. And either way there are fundamentally things we don't know about how physics and stuff works so it seems hard to answer conclusively. ↩