Reputation: 62051
I'd prefer as little formal definition as possible and simple mathematics.
Upvotes: 5393
Views: 844470
Reputation: 91142
Big-O notation (also called "asymptotic growth" notation) is what functions "look like" when you ignore constant factors and stuff near the origin. We use it to talk about how thing scale.
Basics
for "sufficiently" large inputs...
f(x) ∈ O(upperbound)
means f
"grows no faster than" upperbound
f(x) ∈ Ɵ(justlikethis)
mean f
"grows exactly like" justlikethis
f(x) ∈ Ω(lowerbound)
means f
"grows no slower than" lowerbound
big-O notation doesn't care about constant factors: the function 9x²
is said to "grow exactly like" 10x²
. Neither does big-O asymptotic notation care about non-asymptotic stuff ("stuff near the origin" or "what happens when the problem size is small"): the function 10x²
is said to "grow exactly like" 10x² - x + 2
.
Why would you want to ignore the smaller parts of the equation? Because they become completely dwarfed by the big parts of the equation as you consider larger and larger scales; their contribution becomes dwarfed and irrelevant. (See example section.)
Put another way, it's all about the ratio as you go to infinity. If you divide the actual time it takes by the O(...)
, you will get a constant factor in the limit of large inputs. Intuitively this makes sense: functions "scale like" one another if you can multiply one to get the other. That is when we say...
actualAlgorithmTime(N) ∈ O(bound(N))
e.g. "time to mergesort N elements
is O(N log(N))"
... this means that for "large enough" problem sizes N (if we ignore stuff near the origin), there exists some constant (e.g. 2.5, completely made up) such that:
actualAlgorithmTime(N) e.g. "mergesort_duration(N) "
────────────────────── < constant ───────────────────── < 2.5
bound(N) N log(N)
There are many choices of constant; often the "best" choice is known as the "constant factor" of the algorithm... but we often ignore it like we ignore non-largest terms (see Constant Factors section for why they don't usually matter). You can also think of the above equation as a bound, saying "In the worst-case scenario, the time it takes will never be worse than roughly N*log(N)
, within a factor of 2.5 (a constant factor we don't care much about)".
In general, O(...)
is the most useful one because we often care about worst-case behavior. If f(x)
represents something "bad" like the processor or memory usage, then "f(x) ∈ O(upperbound)
" means "upperbound
is the worst-case scenario of processor/memory usage".
Applications
As a purely mathematical construct, big-O notation is not limited to talking about processing time and memory. You can use it to discuss the asymptotics of anything where scaling is meaningful, such as:
N
people at a party (Ɵ(N²)
, specifically N(N-1)/2
, but what matters is that it "scales like" N²
)Example
For the handshake example above, everyone in a room shakes everyone else's hand. In that example, #handshakes ∈ Ɵ(N²)
. Why?
Back up a bit: the number of handshakes is exactly n-choose-2 or N*(N-1)/2
(each of N people shakes the hands of N-1 other people, but this double-counts handshakes so divide by 2):
However, for very large numbers of people, the linear term N
is dwarfed and effectively contributes 0 to the ratio (in the chart: the fraction of empty boxes on the diagonal over total boxes gets smaller as the number of participants becomes larger). Therefore the scaling behavior is order N²
, or the number of handshakes "grows like N²".
#handshakes(N)
────────────── ≈ 1/2
N²
It's as if the empty boxes on the diagonal of the chart (N*(N-1)/2 checkmarks) weren't even there (N2 checkmarks asymptotically).
(temporary digression from "plain English":) If you wanted to prove this to yourself, you could perform some simple algebra on the ratio to split it up into multiple terms (lim
means "considered in the limit of", just ignore it if you haven't seen it, it's just notation for "and N is really really big"):
N²/2 - N/2 (N²)/2 N/2 1/2
lim ────────── = lim ( ────── - ─── ) = lim ─── = 1/2
N→∞ N² N→∞ N² N² N→∞ 1
┕━━━┙
this is 0 in the limit of N→∞:
graph it, or plug in a really large number for N
tl;dr: The number of handshakes 'looks like' x² so much for large values, that if we were to write down the ratio #handshakes/x², the fact that we don't need exactly x² handshakes wouldn't even show up in the decimal for an arbitrarily large while.
e.g. for x=1million, ratio #handshakes/x²: 0.499999...
Building Intuition
This lets us make statements like...
"For large enough inputsize=N, no matter what the constant factor is, if I double the input size...
N → (2N) = 2(N)
N² → (2N)² = 4(N²)
cN³ → c(2N)³ = 8(cN³)
c log(N) → c log(2N) = (c log(2))+(c log(N)) = (fixed amount)+(c log(N))
c*1 → c*1
c 2N log(2N) / c N log(N) (here we divide f(2n)/f(n), but we could have as above massaged the expression and factored out cNlogN as above)
→ 2 log(2N)/log(N)
→ 2 (log(2) + log(N))/log(N)
→ 2*(1+(log2N)-1) (basically 2 for large N; eventually less than 2.000001)
(alternatively, say log(N) will always be below like 17 for your data so it's O(17 N) which is linear; that is not rigorous nor sensical though)
2N → 22N = (4N)............put another way...... 2N → 2N+1 = 2N21 = 2 2N
[for the mathematically inclined, you can mouse over the spoilers for minor sidenotes]
(with credit to https://stackoverflow.com/a/487292/711085 )
(technically the constant factor could maybe matter in some more esoteric examples, but I've phrased things above (e.g. in log(N)) such that it doesn't)
These are the bread-and-butter orders of growth that programmers and applied computer scientists use as reference points. They see these all the time. (So while you could technically think "Doubling the input makes an O(√N) algorithm 1.414 times slower," it's better to think of it as "this is worse than logarithmic but better than linear".)
Constant factors
Usually, we don't care what the specific constant factors are, because they don't affect the way the function grows. For example, two algorithms may both take O(N)
time to complete, but one may be twice as slow as the other. We usually don't care too much unless the factor is very large since optimizing is tricky business ( When is optimisation premature? ); also the mere act of picking an algorithm with a better big-O will often improve performance by orders of magnitude.
Some asymptotically superior algorithms (e.g. a non-comparison O(N log(log(N)))
sort) can have so large a constant factor (e.g. 100000*N log(log(N))
), or overhead that is relatively large like O(N log(log(N)))
with a hidden + 100*N
, that they are rarely worth using even on "big data".
Why O(N) is sometimes the best you can do, i.e. why we need datastructures
O(N)
algorithms are in some sense the "best" algorithms if you need to read all your data. The very act of reading a bunch of data is an O(N)
operation. Loading it into memory is usually O(N)
(or faster if you have hardware support, or no time at all if you've already read the data). However, if you touch or even look at every piece of data (or even every other piece of data), your algorithm will take O(N)
time to perform this looking. No matter how long your actual algorithm takes, it will be at least O(N)
because it spent that time looking at all the data.
The same can be said for the very act of writing. All algorithms which print out N things will take N time because the output is at least that long (e.g. printing out all permutations (ways to rearrange) a set of N playing cards is factorial: O(N!)
(which is why in those cases, good programs will ensure an iteration uses O(1) memory and doesn't print or store every intermediate step)).
This motivates the use of data structures: a data structure requires reading the data only once (usually O(N)
time), plus some arbitrary amount of preprocessing (e.g. O(N)
or O(N log(N))
or O(N²)
) which we try to keep small. Thereafter, modifying the data structure (insertions/deletions/ etc.) and making queries on the data take very little time, such as O(1)
or O(log(N))
. You then proceed to make a large number of queries! In general, the more work you're willing to do ahead of time, the less work you'll have to do later on.
For example, say you had the latitude and longitude coordinates of millions of road segments and wanted to find all street intersections.
O(N)
work only once, but if you want to do it many times (in this case, N
times, once for each segment), we'd have to do O(N²)
work, or 1000000²=1000000000000 operations. Not good (a modern computer can perform about a billion operations per second).O(N)
time. Thereafter, it only takes constant time on average to look up something by its key (in this case, our key is the latitude and longitude coordinates, rounded into a grid; we search the adjacent gridspaces of which there are only 9, which is a constant).O(N²)
to a manageable O(N)
, and all we had to do was pay a minor cost to make a hash table.The moral of the story: a data structure lets us speed up operations. Even more, advanced data structures can let you combine, delay, or even ignore operations in incredibly clever ways. Different problems would have different analogies, but they'd all involve organizing the data in a way that exploits some structure we care about, or which we've artificially imposed on it for bookkeeping. We do work ahead of time (basically planning and organizing), and now repeated tasks are much much easier!
Practical example: visualizing orders of growth while coding
Asymptotic notation is, at its core, quite separate from programming. Asymptotic notation is a mathematical framework for thinking about how things scale and can be used in many different fields. That said... this is how you apply asymptotic notation to coding.
The basics: Whenever we interact with every element in a collection of size A (such as an array, a set, all keys of a map, etc.), or perform A iterations of a loop, that is a multiplicative factor of size A. Why do I say "a multiplicative factor"?--because loops and functions (almost by definition) have multiplicative running time: the number of iterations, times work done in the loop (or for functions: the number of times you call the function, times work done in the function). (This holds if we don't do anything fancy, like skip loops or exit the loop early, or change control flow in the function based on arguments, which is very common.) Here are some examples of visualization techniques, with accompanying pseudocode.
(here, the x
s represent constant-time units of work, processor instructions, interpreter opcodes, whatever)
for(i=0; i<A; i++) // A * ...
some O(1) operation // 1
--> A*1 --> O(A) time
visualization:
|<------ A ------->|
1 2 3 4 5 x x ... x
other languages, multiplying orders of growth:
javascript, O(A) time and space
someListOfSizeA.map((x,i) => [x,i])
python, O(rows*cols) time and space
[[r*c for c in range(cols)] for r in range(rows)]
Example 2:
for every x in listOfSizeA: // A * (...
some O(1) operation // 1
some O(B) operation // B
for every y in listOfSizeC: // C * (...
some O(1) operation // 1))
--> O(A*(1 + B + C))
O(A*(B+C)) (1 is dwarfed)
visualization:
|<------ A ------->|
1 x x x x x x ... x
2 x x x x x x ... x ^
3 x x x x x x ... x |
4 x x x x x x ... x |
5 x x x x x x ... x B <-- A*B
x x x x x x x ... x |
................... |
x x x x x x x ... x v
x x x x x x x ... x ^
x x x x x x x ... x |
x x x x x x x ... x |
x x x x x x x ... x C <-- A*C
x x x x x x x ... x |
................... |
x x x x x x x ... x v
Example 3:
function nSquaredFunction(n) {
total = 0
for i in 1..n: // N *
for j in 1..n: // N *
total += i*k // 1
return total
}
// O(n^2)
function nCubedFunction(a) {
for i in 1..n: // A *
print(nSquaredFunction(a)) // A^2
}
// O(a^3)
If we do something slightly complicated, you might still be able to imagine visually what's going on:
for x in range(A):
for y in range(1..x):
simpleOperation(x*y)
x x x x x x x x x x |
x x x x x x x x x |
x x x x x x x x |
x x x x x x x |
x x x x x x |
x x x x x |
x x x x |
x x x |
x x |
x___________________|
Here, the smallest recognizable outline you can draw is what matters; a triangle is a two dimensional shape (0.5 A^2), just like a square is a two-dimensional shape (A^2); the constant factor of two here remains in the asymptotic ratio between the two, however, we ignore it like all factors... (There are some unfortunate nuances to this technique I don't go into here; it can mislead you.)
Of course this does not mean that loops and functions are bad; on the contrary, they are the building blocks of modern programming languages, and we love them. However, we can see that the way we weave loops and functions and conditionals together with our data (control flow, etc.) mimics the time and space usage of our program! If time and space usage becomes an issue, that is when we resort to cleverness and find an easy algorithm or data structure we hadn't considered, to reduce the order of growth somehow. Nevertheless, these visualization techniques (though they don't always work) can give you a naive guess at a worst-case running time.
Here is another thing we can recognize visually:
<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x
x x x x x x x x
x x x x
x x
x
We can just rearrange this and see it's O(N):
<----------------------------- N ----------------------------->
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x|x x x x x x x x|x x x x|x x|x
Or maybe you do log(N) passes of the data, for O(N*log(N)) total time:
<----------------------------- N ----------------------------->
^ x x x x x x x x x x x x x x x x|x x x x x x x x x x x x x x x x
| x x x x x x x x|x x x x x x x x|x x x x x x x x|x x x x x x x x
lgN x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x|x x x x
| x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x|x x
v x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x|x
Unrelatedly but worth mentioning again: If we perform a hash (e.g. a dictionary/hashtable lookup), that is a factor of O(1). That's pretty fast.
[myDictionary.has(x) for x in listOfSizeA]
\----- O(1) ------/
--> A*1 --> O(A)
If we do something very complicated, such as with a recursive function or divide-and-conquer algorithm, you can use the Master Theorem (usually works), or in ridiculous cases the Akra-Bazzi Theorem (almost always works) you look up the running time of your algorithm on Wikipedia.
But, programmers don't think like this because eventually, algorithm intuition just becomes second nature. You will start to code something inefficient and immediately think "am I doing something grossly inefficient?". If the answer is "yes" AND you foresee it actually mattering, then you can take a step back and think of various tricks to make things run faster (the answer is almost always "use a hashtable", rarely "use a tree", and very rarely something a bit more complicated).
Amortized and average-case complexity
There is also the concept of "amortized" and/or "average case" (note that these are different).
Average Case: This is no more than using big-O notation for the expected value of a function, rather than the function itself. In the usual case where you consider all inputs to be equally likely, the average case is just the average of the running time. For example with quicksort, even though the worst-case is O(N^2)
for some really bad inputs, the average case is the usual O(N log(N))
(the really bad inputs are very small in number, so few that we don't notice them in the average case).
Amortized Worst-Case: Some data structures may have a worst-case complexity that is large, but guarantee that if you do many of these operations, the average amount of work you do will be better than worst-case. For example, you may have a data structure that normally takes constant O(1)
time. However, occasionally it will 'hiccup' and take O(N)
time for one random operation, because maybe it needs to do some bookkeeping or garbage collection or something... but it promises you that if it does hiccup, it won't hiccup again for N more operations. The worst-case cost is still O(N)
per operation, but the amortized cost over many runs is O(N)/N
= O(1)
per operation. Because the big operations are sufficiently rare, the massive amount of occasional work can be considered to blend in with the rest of the work as a constant factor. We say the work is "amortized" over a sufficiently large number of calls that it disappears asymptotically.
The analogy for amortized analysis:
You drive a car. Occasionally, you need to spend 10 minutes going to the gas station and then spend 1 minute refilling the tank with gas. If you did this every time you went anywhere with your car (spend 10 minutes driving to the gas station, spend a few seconds filling up a fraction of a gallon), it would be very inefficient. But if you fill up the tank once every few days, the 11 minutes spent driving to the gas station is "amortized" over a sufficiently large number of trips, that you can ignore it and pretend all your trips were maybe 5% longer.
Comparison between average-case and amortized worst-case:
Though, if you're reasonably worried about an attacker, there are many other algorithmic attack vectors to worry about besides amortization and average-case.)
Both average-case and amortization are incredibly useful tools for thinking about and designing with scaling in mind.
(See Difference between average case and amortized analysis if interested in this subtopic.)
Multidimensional big-O
Most of the time, people don't realize that there's more than one variable at work. For example, in a string-search algorithm, your algorithm may take time O([length of text] + [length of query])
, i.e. it is linear in two variables like O(N+M)
. Other more naive algorithms may be O([length of text]*[length of query])
or O(N*M)
. Ignoring multiple variables is one of the most common oversights I see in algorithm analysis, and can handicap you when designing an algorithm.
The whole story
Keep in mind that big-O is not the whole story. You can drastically speed up some algorithms by using caching, making them cache-oblivious, avoiding bottlenecks by working with RAM instead of disk, using parallelization, or doing work ahead of time -- these techniques are often independent of the order-of-growth "big-O" notation, though you will often see the number of cores in the big-O notation of parallel algorithms.
Also keep in mind that due to hidden constraints of your program, you might not really care about asymptotic behavior. You may be working with a bounded number of values, for example:
O(N log(N))
quicksort; you want to use insertion sort, which happens to perform well on small inputs. These situations often come up in divide-and-conquer algorithms, where you split up the problem into smaller and smaller subproblems, such as recursive sorting, fast Fourier transforms, or matrix multiplication.In practice, even among algorithms which have the same or similar asymptotic performance, their relative merit may actually be driven by other things, such as: other performance factors (quicksort and mergesort are both O(N log(N))
, but quicksort takes advantage of CPU caches); non-performance considerations, like ease of implementation; whether a library is available, and how reputable and maintained the library is.
Programs will also run slower on a 500MHz computer vs 2GHz computer. We don't really consider this as part of the resource bounds, because we think of the scaling in terms of machine resources (e.g. per clock cycle), not per real second. However, there are similar things which can 'secretly' affect performance, such as whether you are running under emulation, or whether the compiler optimized code or not. These might make some basic operations take longer (even relative to each other), or even speed up or slow down some operations asymptotically (even relative to each other). The effect may be small or large between different implementation and/or environment. Do you switch languages or machines to eke out that little extra work? That depends on a hundred other reasons (necessity, skills, coworkers, programmer productivity, the monetary value of your time, familiarity, workarounds, why not assembly or GPU, etc...), which may be more important than performance.
The above issues, like the effect of the choice of which programming language is used, are almost never considered as part of the constant factor (nor should they be); yet one should be aware of them because sometimes (though rarely) they may affect things. For example in cpython, the native priority queue implementation is asymptotically non-optimal (O(log(N))
rather than O(1)
for your choice of insertion or find-min); do you use another implementation? Probably not, since the C implementation is probably faster, and there are probably other similar issues elsewhere. There are tradeoffs; sometimes they matter and sometimes they don't.
(edit: The "plain English" explanation ends here.)
Math addenda
For completeness, the precise definition of big-O notation is as follows: f(x) ∈ O(g(x))
means that "f is asymptotically upper-bounded by const*g": ignoring everything below some finite value of x, there exists a constant such that |f(x)| ≤ const * |g(x)|
. (The other symbols are as follows: just like O
means ≤, Ω
means ≥. There are lowercase variants: o
means <, and ω
means >.) f(x) ∈ Ɵ(g(x))
means both f(x) ∈ O(g(x))
and f(x) ∈ Ω(g(x))
(upper- and lower-bounded by g): there exists some constants such that f will always lie in the "band" between const1*g(x)
and const2*g(x)
. It is the strongest asymptotic statement you can make and roughly equivalent to ==
. (Sorry, I elected to delay the mention of the absolute-value symbols until now, for clarity's sake; especially because I have never seen negative values come up in a computer science context.)
People will often use = O(...)
, which is perhaps the more correct 'comp-sci' notation, and entirely legitimate to use; "f = O(...)" is read "f is order ... / f is xxx-bounded by ..." and is thought of as "f is some expression whose asymptotics are ...". I was taught to use the more rigorous ∈ O(...)
. ∈
means "is an element of" (still read as before). In this particular case, O(N²)
contains elements like {2 N²
, 3 N²
, 1/2 N²
, 2 N² + log(N)
, - N² + N^1.9
, ...} and is infinitely large, but it's still a set.
O and Ω are not symmetric (n = O(n²), but n² is not O(n)), but Ɵ is symmetric, and (since these relations are all transitive and reflexive) Ɵ, therefore, is symmetric and transitive and reflexive, and therefore partitions the set of all functions into equivalence classes. An equivalence class is a set of things that we consider to be the same. That is to say, given any function you can think of, it will belong to a family of similar functions[see my Note in comments]; just like you can group all integers into odds or evens, you can group all functions with Ɵ into x-ish, log(x)^2-ish, etc... by basically ignoring smaller terms (but sometimes you might be stuck with more complicated functions which are separate classes unto themselves).
The =
notation might be the more common one and is even used in papers by world-renowned computer scientists. Additionally, it is often the case that in a casual setting, people will say O(...)
when they mean Ɵ(...)
; this is technically true since the set of things Ɵ(exactlyThis)
is a subset of O(noGreaterThanThis)
... and it's easier to type. ;-)
Upvotes: 451
Reputation: 15958
Simplest way to look at it (in plain English)
We are trying to see how the number of input parameters, affects the running time of an algorithm. If the running time of your application is proportional to the number of input parameters, then it is said to be in Big O of n.
The above statement is a good start but not completely true.
A more accurate explanation (mathematical)
Suppose
n=number of input parameters
T(n)= The actual function that expresses the running time of the algorithm as a function of n
c= a constant
f(n)= An approximate function that expresses the running time of the algorithm as a function of n
Then as far as Big O is concerned, the approximation f(n) is considered good enough as long as the below condition is true.
lim T(n) ≤ c×f(n)
n→∞
The equation is read as As n approaches infinity, T of n, is less than or equal to c times f of n.
If the above equation is true, then the running time of the algorithm is said to be in Big Oh of n
In big O notation this is written as
T(n)∈O(n)
This is read as T of n is in big O of n.
Back to English
Based on the mathematical definition above, if you say your algorithm is a Big O of n, it means it is a function of n (number of input parameters) or faster. If your algorithm is Big O of n, then it is also automatically the Big O of n square.
Big O of n means my algorithm runs at least as fast as this. You cannot look at Big O notation of your algorithm and say its slow. You can only say its fast.
Check this out for a video tutorial on Big O from UC Berkley. It is actually a simple concept. If you hear professor Shewchuck (aka God level teacher) explaining it, you will say "Oh that's all it is!".
Upvotes: 9
Reputation: 2436
What is a plain English explanation of “Big O” notation?
I would like to stress that the driving motive for “Big O” notation is one thing, when an input size of algorithm gets too big some parts (i.e constants, coefficients, terms )of the equation describing the measure of the algorithm becomes so insignificant that we ignore them. The parts of equation that survives after ignoring some of its parts is termed as the “Big O” notation of the algorithm.
So if the input size is NOT too big the idea of “Big O” notation( upper bound ) will be unimportant.
int sumArray (int[] nums){
int sum=0; // here we've 1 operation
for(int i=0; i < nums.length;i++){ // we've n times
sum += nums[i]; // taking initialization and assignments, 3 ops
}
return sum;
}
In above algorithm, let's say you find out T(n)
as follows (time complexity):
T(n) = 3*n + 2
To find its “Big O” notation, we need to consider very big input size:
n= 1,000,000 -> T(1,000,000) = 3,000,002
n=1,000,000,000 -> T(1,000,000,000) = 3,000,000,002
n=10,000,000,000 -> T(10,000,000,000) = 30,000,000,002
Lets give this similar inputs for another function F(n) = n
n= 1,000,000 -> F(1,000,000) = 1,000,000
n=1,000,000,000 -> F(1,000,000,000) = 1,000,000,000
n=10,000,000,000 -> F(10,000,000,000) = 10,000,000,000
As you can see as input size get too big the T(n)
approximately equal to or getting closer to F(n)
, so the constant 2
and the coefficient 3
are becoming too insignificant, now the idea of Big O” notation comes in,
O(T(n)) = F(n)
O(T(n)) = n
We say the big O of T(n)
is n
, and the notation is O(T(n)) = n
, it is the upper bound of T(n)
as n
gets too big. the same step applies for other algorithms.
Upvotes: 4
Reputation: 16229
It shows how an algorithm scales based on input size.
O(n2): known as Quadratic complexity
Notice that the number of items increases by a factor of 10, but the time increases by a factor of 102. Basically, n=10 and so O(n2) gives us the scaling factor n2 which is 102.
O(n): known as Linear complexity
This time the number of items increases by a factor of 10, and so does the time. n=10 and so O(n)'s scaling factor is 10.
O(1): known as Constant complexity
The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.
O(log n): known as Logarithmic complexity
The number of computations is only increased by a log of the input value. So in this case, assuming each computation takes 1 second, the log of the input n
is the time required, hence log n
.
That's the gist of it. They reduce the maths down so it might not be exactly n2 or whatever they say it is, but that'll be the dominating factor in the scaling.
Upvotes: 815
Reputation: 625307
Quick note, my answer is almost certainly confusing Big Oh notation (which is an upper bound) with Big Theta notation "Θ" (which is a two-side bound). But in my experience, this is actually typical of discussions in non-academic settings. Apologies for any confusion caused.
BigOh complexity can be visualized with this graph:
The simplest definition I can give for Big Oh notation is this:
Big Oh notation is a relative representation of the complexity of an algorithm.
There are some important and deliberately chosen words in that sentence:
- relative: you can only compare apples to apples. You can't compare an algorithm that does arithmetic multiplication to an algorithm that sorts a list of integers. But a comparison of two algorithms to do arithmetic operations (one multiplication, one addition) will tell you something meaningful;
- representation: BigOh (in its simplest form) reduces the comparison between algorithms to a single variable. That variable is chosen based on observations or assumptions. For example, sorting algorithms are typically compared based on comparison operations (comparing two nodes to determine their relative ordering). This assumes that comparison is expensive. But what if the comparison is cheap but swapping is expensive? It changes the comparison; and
- complexity: if it takes me one second to sort 10,000 elements, how long will it take me to sort one million? Complexity in this instance is a relative measure to something else.
Come back and reread the above when you've read the rest.
The best example of BigOh I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learned in school were:
- addition;
- subtraction;
- multiplication; and
- division.
Each of these is an operation or a problem. A method of solving these is called an algorithm.
The addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.
Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.
See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits n in the larger number. We call this O(n) or linear complexity.
Subtraction is similar (except you may need to borrow instead of carry).
Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.
If we have two 100-digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.
As the algorithm scales with n-squared, this is O(n2) or quadratic complexity. This is a good time to introduce another important concept:
We only care about the most significant portion of complexity.
The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.0002% of the total operations by that stage).
One can notice that we've assumed the worst case scenario here. While multiplying 6 digit numbers, if one of them has 4 digits and the other one has 6 digits, then we only have 24 multiplications. Still, we calculate the worst case scenario for that 'n', i.e when both are 6 digit numbers. Hence Big Oh notation is about the Worst-case scenario of an algorithm.
The next best example I can think of is the telephone book, normally called the White Pages or similar but it varies from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.
Now if you were instructing a computer to look up the phone number for "John Smith" in a telephone book that contains 1,000,000 names, what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?
A typical implementation might be to open up to the middle, take the 500,000th and compare it to "Smith". If it happens to be "Smith, John", we just got really lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.
This is called a binary search and is used every day in programming whether you realize it or not.
So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 20 times. In comparing search algorithms we decide that this comparison is our 'n'.
- For a phone book of 3 names it takes 2 comparisons (at most).
- For 7 it takes at most 3.
- For 15 it takes 4.
- …
- For 1,000,000 it takes 20.
That is staggeringly good, isn't it?
In BigOh terms this is O(log n) or logarithmic complexity. Now the logarithm in question could be ln (base e), log10, log2 or some other base. It doesn't matter it's still O(log n) just like O(2n2) and O(100n2) are still both O(n2).
It's worthwhile at this point to explain that BigOh can be used to determine three cases with an algorithm:
- Best Case: In the telephone book search, the best case is that we find the name in one comparison. This is O(1) or constant complexity;
- Expected Case: As discussed above this is O(log n); and
- Worst Case: This is also O(log n).
Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.
Back to the telephone book.
What if you have a phone number and want to find a name? The police have a reverse phone book but such look-ups are denied to the general public. Or are they? Technically you can reverse look-up a number in an ordinary phone book. How?
You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).
So to find a name given the phone number (reverse lookup):
- Best Case: O(1);
- Expected Case: O(n) (for 500,000); and
- Worst Case: O(n) (for 1,000,000).
This is quite a famous problem in computer science and deserves a mention. In this problem, you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Traveling Salesman problem is to find the shortest tour that visits every town.
Sounds simple? Think again.
If you have 3 towns A, B, and C with roads between all pairs then you could go:
- A → B → C
- A → C → B
- B → C → A
- B → A → C
- C → A → B
- C → B → A
Well, actually there's less than that because some of these are equivalent (A → B → C and C → B → A are equivalent, for example, because they use the same roads, just in reverse).
In actuality, there are 3 possibilities.
- Take this to 4 towns and you have (iirc) 12 possibilities.
- With 5 it's 60.
- 6 becomes 360.
This is a function of a mathematical operation called a factorial. Basically:
- 5! = 5 × 4 × 3 × 2 × 1 = 120
- 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
- 7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
- …
- 25! = 25 × 24 × … × 2 × 1 = 15,511,210,043,330,985,984,000,000
- …
- 50! = 50 × 49 × … × 2 × 1 = 3.04140932 × 1064
So the BigOh of the Traveling Salesman problem is O(n!) or factorial or combinatorial complexity.
By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.
Something to think about.
Another point I wanted to make a quick mention of is that any algorithm that has a complexity of O(na) is said to have polynomial complexity or is solvable in polynomial time.
O(n), O(n2) etc. are all polynomial time. Some problems cannot be solved in polynomial time. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.
Anyway, that's it for my (hopefully plain English) explanation of BigOh (revised).
Upvotes: 7126
Reputation: 1806
Big O in plain english is like <= (less than or equal). When we say for two functions f and g, f = O(g) it means that f <= g.
However, this does not mean that for any n f(n) <= g(n). Actually what it means is that f is less than or equal to g in terms of growth. It means that after a point f(n) <= c*g(n) if c is a constant. And after a point means than for all n >= n0 where n0 is another constant.
Upvotes: 4
Reputation: 1130
Just to express a complexity of an algorithm in a fast and simple way. The big O notation exist to explain the best, worst, and average-case time complexities for any given algorithm. There are just numerical functions over the size of possible problem instances.
In other way it is very difficult to work precisely with these functions, because they tend to:
It proves to be much easier to talk in terms of simple upper and lower bounds of time-complexity functions using the Big Oh notation. The Big Oh simplifies our analysis by ignoring levels of detail that do not impact our comparison of algorithms. The Big Oh notation ignores the difference between multiplicative constants. The functions f(n)=2n and g(n) = n are identical in Big Oh analysis
https://mimoza.marmara.edu.tr/~msakalli/cse706_12/SkienaTheAlgorithmDesignMan ual.pdf
Upvotes: 0
Reputation: 780
When we have a function like f(n) = n+3
and we want to know how the graph looks likes when n
approaches infinity, we just drop all the constants and lower order terms because they don't matter when n
gets big.
Which leaves us with f(n) = n
, so why can't we just use this, why do we need to look for some function which is above and below our f(n) = n+3
function, so big O and big Omega.
Because it would be incorrect to say that the function is just f(n) = n
when n
approaches infinity, so to be correct we describe the area where the f(n) = n+3
could be. We are not interested where the graph is exactly, because lower order terms and constant don't change the growth of the graph significantly, so in other words the area which is enclosed from upper and lower bound is a vague version of our f(n) = n+3 function.
The mere dropping of the constant and lower order term is exactly the process of finding the function which is below and above.
By definition is a function a lower or upper bound of another function if you can find a constant with whom you can multiply the f(n) = n
function so that for every n
the output is bigger (or smaller for lower bound) than for the original function:
f(n) = n*C > f(n) = n+3
And yes C = 2
would do it, therefore our function f(n) = n
can be an upper bound of our f(x) = x+3
function.
Same for lower bound:
f(n) = n*C < f(n) = n+3
C = -2
would do it
So f(x) = n
is the upper and lower bound of f(x) = x+3
, when its both big O and Omega than its Theta, which means its tightly bound.
So big O could also be f(x) = x^2
because it fulfills the condition f(n) = n^2*C > f(n) = n+3
. Its above our f(n) = n+3
graph, but the area between this upper bound and the lower bound is not as precise as our earlier bounds.
Upvotes: 3
Reputation: 51543
From (source) one can read:
Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. (..) In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows.
Big O
notation does not represent a function per si but rather a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same
O
notation.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O
notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n)
:
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
and O(n log n)
as:
An algorithm is said to run in quasilinear time/space if T(n) = O(n log^k n) for some positive constant k; linearithmic time/space is the case k = 1 (source).
Nonetheless, typically such relaxed phrasing is normally used to quantify (for the worst-case scenario) how a set of algorithms behaves compared with another set of algorithms regarding the increase of their input sizes. To compare two classes of algorithms (e.g., O(n log n)
and O(n)
) one should analyze how both classes of algorithms behaves with the increase of their input size (i.e., n) for the worse-case scenario; analyzing n
when it tends to the infinity
In the image above big-O
denote one of the asymptotically least upper-bounds of the plotted functions, and does not refer to the sets O(f(n))
.
For instance comparing O(n log n)
vs. O(n)
as one can see in the image after a certain input, O(n log n)
(green line) grows faster than O(n)
(yellow line). That is why (for the worst-case) O(n)
is more desirable than O(n log n)
because one can increase the input size, and the growth rate will increase slower with the former than with the latter.
Upvotes: 2
Reputation: 5960
Algorithm example (Java):
public boolean search(/* for */Integer K,/* in */List</* of */Integer> L)
{
for(/* each */Integer i:/* in */L)
{
if(i == K)
{
return true;
}
}
return false;
}
Algorithm description:
This algorithm searches a list, item by item, looking for a key,
Iterating on each item in the list, if it's the key then return True,
If the loop has finished without finding the key, return False.
Big-O notation represents the upper-bound on the Complexity (Time, Space, ..)
To find The Big-O on Time Complexity:
Calculate how much time (regarding input size) the worst case takes:
Worst-Case: the key doesn't exist in the list.
Time(Worst-Case) = 4n+1
Time: O(4n+1) = O(n) | in Big-O, constants are neglected
O(n) ~ Linear
There's also Big-Omega, which represent the complexity of the Best-Case:
Best-Case: the key is the first item.
Time(Best-Case) = 4
Time: Ω(4) = O(1) ~ Instant\Constant
Upvotes: 26
Reputation: 179
There are some great answers already posted, but I would like to contribute in a different way. If you want to visualize what all is happening you can assume that a compiler can perform close to 10^8 operations in ~1sec. If the input is given in 10^8, you might want to design an algorithm that operates in a linear fashion(like an un-nested for-loop). below is the table that can help you to quickly figure out the type of algorithm you want to figure out ;)
Upvotes: 3
Reputation: 610
I found a really great explanation about big O notation especially for a someone who's not much into mathematics.
https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.
Anyone who's read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms.
As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible.
O(1)
O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.
bool IsFirstElementNull(IList<string> elements) { return elements[0] == null; }
O(N)
O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.
bool ContainsValue(IList<string> elements, string value) { foreach (var element in elements) { if (element == value) return true; } return false; }
O(N2)
O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc.
bool ContainsDuplicates(IList<string> elements) { for (var outer = 0; outer < elements.Count; outer++) { for (var inner = 0; inner < elements.Count; inner++) { // Don't compare with self if (outer == inner) continue; if (elements[outer] == elements[inner]) return true; } } return false; }
O(2N)
O(2N) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2N) function is exponential - starting off very shallow, then rising meteorically. An example of an O(2N) function is the recursive calculation of Fibonacci numbers:
int Fibonacci(int number) { if (number <= 1) return number; return Fibonacci(number - 2) + Fibonacci(number - 1); }
Logarithms
Logarithms are slightly trickier to explain so I'll use a common example:
Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.
This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.
Upvotes: 8
Reputation: 19151
Assume we're talking about an algorithm A, which should do something with a dataset of size n.
Then O( <some expression X involving n> )
means, in simple English:
If you're unlucky when executing A, it might take as much as X(n) operations to complete.
As it happens, there are certain functions (think of them as implementations of X(n)) that tend to occur quite often. These are well known and easily compared (Examples: 1
, Log N
, N
, N^2
, N!
, etc..)
By comparing these when talking about A and other algorithms, it is easy to rank the algorithms according to the number of operations they may (worst-case) require to complete.
In general, our goal will be to find or structure an algorithm A in such a way that it will have a function X(n)
that returns as low a number as possible.
Upvotes: 12
Reputation: 3356
It represents the speed of an algorithm in the long run.
To take a literal analogy, you don't care how fast a runner can sprint a 100m dash, or even a 5k run. You care more about marathoners, and preferably ultra marathoners (beyond which the analogy to running breaks down and you have to revert to the metaphorical meaning of "the long run").
You can safely stop reading here.
I'm adding this answer because I'm surprised how mathematical and technical the rest of the answers are. The notion of the "long run" in first sentence is related to the arbitrarily time-consuming computational tasks. Unlike running, which is limited by human capacity, computational tasks can take even more than millions of years for certain algorithms to complete.
What about all those mathematical logarithms and polynomials? It turns out that algorithms are intrinsically related to these mathematical terms. If you are measuring the heights of all the kids on the block, it will take you as much time as there are kids. This is intrinsically related to the notion of n^1 or just n where n is nothing more than the number of kids on the block. In the ultra-marathon case, you are measuring the heights of all the kids in your city, but you then have to ignore travel times and assume they are all available to you in a line (otherwise we jump ahead of the current explanation).
Suppose then you are trying to arrange the list that you made of of kids heights in order of shortest height to longest height. If it is just the kids in your neighborhood you might just eyeball it and come up with the ordered list. This is the "sprint" analogy, and we truly don't care about sprints in computer science because why use a computer when you can eyeball something?
But if you were arranging the list of the heights of all kids in your city, or better yet, your country, then you will find that how you do it is intrinsically tied to the mathematical log and n^2. Going through your list to find the shortest kid, writing his name in a separate notebook, and crossing it out from the original notebook is intrinsically tied to the mathematical n^2. If you think of arranging half your notebook, then the other half, and then combining the results, you will arrive at a method that is intrinsically tied to the logarithm.
Finally, suppose you first had to go to the store to buy a measuring tape. This is an example of an effort that is of consequence in short sprints, such as measuring the kids on the block, but when you are measuring all the kids in the city you can safely ignore this cost. This is the intrinsic connection to the mathematical dropping of say lower order polynomial terms.
I hope I have explained that the big-O notation is merely about the long run, that the mathematics is inherently connected to ways of computation, and that the dropping of mathematical terms and other simplifications are connected to the long run in a rather common sense way.
Once you realize this, you'll find the big-O is really super-easy because all the hard high school math just drops out easily. The only difficult part is analyzing an algorithm to identify the mathematical terms, but with some practice you can start dropping terms during the analysis itself and safely ignore chunks of the algorithm to focus only on the part that is relevant to the big-O. I. e. you should be able to eyeball most situations.
Happy big-O-ing, it was my favorite thing about Computer Science -- finding that something was way easier than I thought, and then being able to show off at Google interviews when the uninitiated would be intimidated, lol.
Upvotes: 5
Reputation:
If I want to explain this to 6 years old child I will start to draw some functions f(x) = x and f(x) = x^2 for example and ask a child which function will be the upper function on the top of the page. Then we will proceed with drawing and see that x^2 wins. "Who wins" actually is the function which grows faster when x tends to infinity. So "function x is in Big O of x^2" means that x grows slower than x^2 when x tends to infinity. The same can be done when x tends to 0. If we draw these two function for x from 0 to 1 x will be an upper function, so "function x^2 is in Big O of x for x tends to 0". When the child will get older I add that really Big O can be a function which grows not faster but the same way as given function. Moreover constant is discarded. So 2x is in Big O of x.
Upvotes: 3
Reputation: 28666
Definition :- Big O notation is a notation which says how a algorithm performance will perform if the data input increases.
When we talk about algorithms there are 3 important pillars Input , Output and Processing of algorithm. Big O is symbolic notation which says if the data input is increased in what rate will the performance vary of the algorithm processing.
I would encourage you to see this youtube video which explains Big O Notation in depth with code examples.
So for example assume that a algorithm takes 5 records and the time required for processing the same is 27 seconds. Now if we increase the records to 10 the algorithm takes 105 seconds.
In simple words the time taken is square of the number of records. We can denote this by O(n ^ 2). This symbolic representation is termed as Big O notation.
Now please note the units can be anything in inputs it can be bytes , bits number of records , the performance can be measured in any unit like second , minutes , days and so on. So its not the exact unit but rather the relationship.
For example look at the below function "Function1" which takes a collection and does processing on the first record. Now for this function the performance will be same irrespective you put 1000 , 10000 or 100000 records. So we can denote it by O(1).
void Function1(List<string> data)
{
string str = data[0];
}
Now see the below function "Function2()". In this case the processing time will increase with number of records. We can denote this algorithm performance using O(n).
void Function2(List<string> data)
{
foreach(string str in data)
{
if (str == "shiv")
{
return;
}
}
}
When we see a Big O notation for any algorithm we can classify them in to three categories of performance :-
So by looking at Big O notation we categorize good and bad zones for algorithms.
I would recommend you to watch this 10 minutes video which discusses Big O with sample code
https://www.youtube.com/watch?v=k6kxtzICG_g
Upvotes: 10
Reputation: 6431
TLDR: Big O explains performance of an algorithm in mathematical terms.
Slower algorithms tend to run at n to the power of x or many, depending on depth of it, whereas faster ones like binary search run at O(log n), which makes it run faster as data set gets larger. Big O could be explained with other terms using n, or not even using n too (ie: O(1) ).
One can calculate Big O Looking at the most complex lines of the algorithm.
With small or unsorted datasets Big O can be surprising, as n log n complexity algorithms like binary search can be slow for smaller or unsorted sets, for a simple running example of linear search versus binary search, take a look at my JavaScript example:
https://codepen.io/serdarsenay/pen/XELWqN?editors=1011 (algorithms written below)
function lineerSearch() {
init();
var t = timer('lineerSearch benchmark');
var input = this.event.target.value;
for(var i = 0;i<unsortedhaystack.length - 1;i++) {
if (unsortedhaystack[i] === input) {
document.getElementById('result').innerHTML = 'result is... "' + unsortedhaystack[i] + '", on index: ' + i + ' of the unsorted array. Found' + ' within ' + i + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return unsortedhaystack[i];
}
}
}
function binarySearch () {
init();
sortHaystack();
var t = timer('binarySearch benchmark');
var firstIndex = 0;
var lastIndex = haystack.length-1;
var input = this.event.target.value;
//currently point in the half of the array
var currentIndex = (haystack.length-1)/2 | 0;
var iterations = 0;
while (firstIndex <= lastIndex) {
currentIndex = (firstIndex + lastIndex)/2 | 0;
iterations++;
if (haystack[currentIndex] < input) {
firstIndex = currentIndex + 1;
//console.log(currentIndex + " added, fI:"+firstIndex+", lI: "+lastIndex);
} else if (haystack[currentIndex] > input) {
lastIndex = currentIndex - 1;
//console.log(currentIndex + " substracted, fI:"+firstIndex+", lI: "+lastIndex);
} else {
document.getElementById('result').innerHTML = 'result is... "' + haystack[currentIndex] + '", on index: ' + currentIndex + ' of the sorted array. Found' + ' within ' + iterations + ' iterations';
console.log(document.getElementById('result').innerHTML);
t.stop();
return true;
}
}
}
Upvotes: 2
Reputation: 22113
What is a plain English explanation of “Big O” notation?
Very Quick Note:
The O in "Big O" refers to as "Order"(or precisely "order of")
so you could get its idea literally that it's used to order something to compare them.
"Big O" does two things:
Notations
.There are seven most used notations
1
step, it's excellent, Ordered No.1logN
steps, its good, Ordered No.2N
steps, its fair, Order No.3O(NlogN)
steps, it's not good, Order No.4N^2
steps, it's bad, Order No.52^N
steps, it's horrible, Order No.6N!
steps, it's terrible, Order No.7Suppose you get notation O(N^2)
, not only you are clear the method takes N*N steps to accomplish a task, also you see that it's not good as O(NlogN)
from its ranking.
Please note the order at line end, just for your better understanding.There's more than 7 notations if all possibilities considered.
In CS, the set of steps to accomplish a task is called algorithms.
In Terminology, Big O notation is used to describe the performance or complexity of an algorithm.
In addition, Big O establishes the worst-case or measure the Upper-Bound steps.
You could refer to Big-Ω (Big-Omega) for best case.
Big-Ω (Big-Omega) notation (article) | Khan Academy
Summary
"Big O" describes the algorithm's performance and evaluates it.
or address it formally, "Big O" classifies the algorithms and standardize the comparison process.
Upvotes: 12
Reputation: 358
Big O - Economic Point of View.
My favourite English word to describe this concept is the price you pay for a task as it grows larger.
Think of it as recurring costs instead of fixed costs that you would pay at the beginning. The fixed costs become negligible in the big picture because costs only grow and they add up. We want to measure how fast they would grow and how soon they would add up with respect to the raw material we give to the set up - size of the problem.
However, if initial set up costs are high and you only produce a small amount of the product, you would want to look at these initial costs - they are also called the constants.
Since, these constants don't matter in the long run, this language allows us to discuss tasks beyond what kind of infrastructure we are running it on. So, the factories can be anywhere and the workers can be whoever - it's all gravy. But the size of the factory and the number of workers would be the things we could vary in the long run as your inputs and outputs grow.
Hence, this becomes a big picture approximation of how much you would have to spend to run something. Since time and space are the economic quantities (i.e. they are limited) here, they can both be expressed using this language.
Technical notes: Some examples of time complexity - O(n) generally means that if a problem is of size 'n', I at least have to see everything. O(log n) generally means that I halve the size of the problem and check and repeat until the task is done. O(n^2) means I need to look at pairs of things (like handshakes at a party between n people).
Upvotes: 4
Reputation: 5531
algorithm: procedure/formula for solving a problem
How do analyze algorithms and how can we compare algorithms against each other?
example: you and a friend are asked to create a function to sum the numbers from 0 to N. You come up with f(x) and your friend comes up with g(x). Both functions have the same result, but a different algorithm. In order to objectively compare the efficiency of the algorithms we use Big-O notation.
Big-O notation: describes how quickly runtime will grow relative to the input as the input get arbitrarily large.
3 key takeaways:
Space complexity: aside from time complexity, we also care about space complexity (how much memory/space an algorithm uses). Instead of checking the time of operations, we check the size of the allocation of memory.
Upvotes: 8
Reputation: 1084
Big O is describing a class of functions.
It describes how fast functions grow for big input values.
For a given function f, O(f) descibes all functions g(n) for which you can find an n0 and a constant c so that all values of g(n) with n >= n0 are less or equal to c*f(n)
In less mathematical words O(f) is a set of functions. Namely all functions, that from some value n0 onwards, are growing slower or as fast as f.
If f(n) = n then
g(n) = 3n is in O(f).Because constant factors do not matter h(n) = n+1000 is in O(f) because it might be bigger for all values smaler than 1000 but for big O only huge inputs matter.
However i(n) = n^2 is not in O(f) because a quadratic funcion grows faster than a linear one.
Upvotes: 5
Reputation: 717
Big O is a means to represent the upper bounds of any function. We generally use it for expressing the upper bounds of a function that tells the running time of an Algorithm.
Ex : f(n) = 2(n^2) +3n be a function representing the running time of a hypothetical algorithm, Big-O notation essentially gives the upper limit for this function which is O(n^2)
This notation basically tells us that, for any input 'n' the running time won't be greater than the value expressed by Big-O notation.
Also, agree with all the above detailed answers. Hope this helps !!
Upvotes: 6
Reputation: 3758
You want to know all there is to know of big O? So do I.
So to talk of big O, I will use words that have just one beat in them. One sound per word. Small words are quick. You know these words, and so do I. We will use words with one sound. They are small. I am sure you will know all of the words we will use!
Now, let’s you and me talk of work. Most of the time, I do not like work. Do you like work? It may be the case that you do, but I am sure I do not.
I do not like to go to work. I do not like to spend time at work. If I had my way, I would like just to play, and do fun things. Do you feel the same as I do?
Now at times, I do have to go to work. It is sad, but true. So, when I am at work, I have a rule: I try to do less work. As near to no work as I can. Then I go play!
So here is the big news: the big O can help me not to do work! I can play more of the time, if I know big O. Less work, more play! That is what big O helps me do.
Now I have some work. I have this list: one, two, three, four, five, six. I must add all things in this list.
Wow, I hate work. But oh well, I have to do this. So here I go.
One plus two is three… plus three is six... and four is... I don’t know. I got lost. It is too hard for me to do in my head. I don’t much care for this kind of work.
So let's not do the work. Let's you and me just think how hard it is. How much work would I have to do, to add six numbers?
Well, let’s see. I must add one and two, and then add that to three, and then add that to four… All in all, I count six adds. I have to do six adds to solve this.
Here comes big O, to tell us just how hard this math is.
Big O says: we must do six adds to solve this. One add, for each thing from one to six. Six small bits of work... each bit of work is one add.
Well, I will not do the work to add them now. But I know how hard it would be. It would be six adds.
Oh no, now I have more work. Sheesh. Who makes this kind of stuff?!
Now they ask me to add from one to ten! Why would I do that? I did not want to add one to six. To add from one to ten… well… that would be even more hard!
How much more hard would it be? How much more work would I have to do? Do I need more or less steps?
Well, I guess I would have to do ten adds… one for each thing from one to ten. Ten is more than six. I would have to work that much more to add from one to ten, than one to six!
I do not want to add right now. I just want to think on how hard it might be to add that much. And, I hope, to play as soon as I can.
To add from one to six, that is some work. But do you see, to add from one to ten, that is more work?
Big O is your friend and mine. Big O helps us think on how much work we have to do, so we can plan. And, if we are friends with big O, he can help us choose work that is not so hard!
Now we must do new work. Oh, no. I don’t like this work thing at all.
The new work is: add all things from one to n.
Wait! What is n? Did I miss that? How can I add from one to n if you don’t tell me what n is?
Well, I don’t know what n is. I was not told. Were you? No? Oh well. So we can’t do the work. Whew.
But though we will not do the work now, we can guess how hard it would be, if we knew n. We would have to add up n things, right? Of course!
Now here comes big O, and he will tell us how hard this work is. He says: to add all things from one to N, one by one, is O(n). To add all these things, [I know I must add n times.][1] That is big O! He tells us how hard it is to do some type of work.
To me, I think of big O like a big, slow, boss man. He thinks on work, but he does not do it. He might say, "That work is quick." Or, he might say, "That work is so slow and hard!" But he does not do the work. He just looks at the work, and then he tells us how much time it might take.
I care lots for big O. Why? I do not like to work! No one likes to work. That is why we all love big O! He tells us how fast we can work. He helps us think of how hard work is.
Uh oh, more work. Now, let’s not do the work. But, let’s make a plan to do it, step by step.
They gave us a deck of ten cards. They are all mixed up: seven, four, two, six… not straight at all. And now... our job is to sort them.
Ergh. That sounds like a lot of work!
How can we sort this deck? I have a plan.
I will look at each pair of cards, pair by pair, through the deck, from first to last. If the first card in one pair is big and the next card in that pair is small, I swap them. Else, I go to the next pair, and so on and so on... and soon, the deck is done.
When the deck is done, I ask: did I swap cards in that pass? If so, I must do it all once more, from the top.
At some point, at some time, there will be no swaps, and our sort of the deck would be done. So much work!
Well, how much work would that be, to sort the cards with those rules?
I have ten cards. And, most of the time -- that is, if I don’t have lots of luck -- I must go through the whole deck up to ten times, with up to ten card swaps each time through the deck.
Big O, help me!
Big O comes in and says: for a deck of n cards, to sort it this way will be done in O(N squared) time.
Why does he say n squared?
Well, you know n squared is n times n. Now, I get it: n cards checked, up to what might be n times through the deck. That is two loops, each with n steps. That is n squared much work to be done. A lot of work, for sure!
Now when big O says it will take O(n squared) work, he does not mean n squared adds, on the nose. It might be some small bit less, for some case. But in the worst case, it will be near n squared steps of work to sort the deck.
Now here is where big O is our friend.
Big O points out this: as n gets big, when we sort cards, the job gets MUCH MUCH MORE HARD than the old just-add-these-things job. How do we know this?
Well, if n gets real big, we do not care what we might add to n or n squared.
For big n, n squared is more large than n.
Big O tells us that to sort things is more hard than to add things. O(n squared) is more than O(n) for big n. That means: if n gets real big, to sort a mixed deck of n things MUST take more time, than to just add n mixed things.
Big O does not solve the work for us. Big O tells us how hard the work is.
I have a deck of cards. I did sort them. You helped. Thanks.
Is there a more fast way to sort the cards? Can big O help us?
Yes, there is a more fast way! It takes some time to learn, but it works... and it works quite fast. You can try it too, but take your time with each step and do not lose your place.
In this new way to sort a deck, we do not check pairs of cards the way we did a while ago. Here are your new rules to sort this deck:
One: I choose one card in the part of the deck we work on now. You can choose one for me if you like. (The first time we do this, “the part of the deck we work on now” is the whole deck, of course.)
Two: I splay the deck on that card you chose. What is this splay; how do I splay? Well, I go from the start card down, one by one, and I look for a card that is more high than the splay card.
Three: I go from the end card up, and I look for a card that is more low than the splay card.
Once I have found these two cards, I swap them, and go on to look for more cards to swap. That is, I go back to step Two, and splay on the card you chose some more.
At some point, this loop (from Two to Three) will end. It ends when both halves of this search meet at the splay card. Then, we have just splayed the deck with the card you chose in step One. Now, all the cards near the start are more low than the splay card; and the cards near the end are more high than the splay card. Cool trick!
Four (and this is the fun part): I have two small decks now, one more low than the splay card, and one more high. Now I go to step one, on each small deck! That is to say, I start from step One on the first small deck, and when that work is done, I start from step One on the next small deck.
I break up the deck in parts, and sort each part, more small and more small, and at some time I have no more work to do. Now this may seem slow, with all the rules. But trust me, it is not slow at all. It is much less work than the first way to sort things!
What is this sort called? It is called Quick Sort! That sort was made by a man called C. A. R. Hoare and he called it Quick Sort. Now, Quick Sort gets used all the time!
Quick Sort breaks up big decks in small ones. That is to say, it breaks up big tasks in small ones.
Hmmm. There may be a rule in there, I think. To make big tasks small, break them up.
This sort is quite quick. How quick? Big O tells us: this sort needs O(n log n) work to be done, in the mean case.
Is it more or less fast than the first sort? Big O, please help!
The first sort was O(n squared). But Quick Sort is O(n log n). You know that n log n is less than n squared, for big n, right? Well, that is how we know that Quick Sort is fast!
If you have to sort a deck, what is the best way? Well, you can do what you want, but I would choose Quick Sort.
Why do I choose Quick Sort? I do not like to work, of course! I want work done as soon as I can get it done.
How do I know Quick Sort is less work? I know that O(n log n) is less than O(n squared). The O's are more small, so Quick Sort is less work!
Now you know my friend, Big O. He helps us do less work. And if you know big O, you can do less work too!
You learned all that with me! You are so smart! Thank you so much!
Now that work is done, let’s go play!
[1]: There is a way to cheat and add all the things from one to n, all at one time. Some kid named Gauss found this out when he was eight. I am not that smart though, so don't ask me how he did it.
Upvotes: 14
Reputation: 12500
Say you order Harry Potter: Complete 8-Film Collection [Blu-ray] from Amazon and download the same film collection online at the same time. You want to test which method is faster. The delivery takes almost a day to arrive and the download completed about 30 minutes earlier. Great! So it’s a tight race.
What if I order several Blu-ray movies like The Lord of the Rings, Twilight, The Dark Knight Trilogy, etc. and download all the movies online at the same time? This time, the delivery still take a day to complete, but the online download takes 3 days to finish. For online shopping, the number of purchased item (input) doesn’t affect the delivery time. The output is constant. We call this O(1).
For online downloading, the download time is directly proportional to the movie file sizes (input). We call this O(n).
From the experiments, we know that online shopping scales better than online downloading. It is very important to understand big O notation because it helps you to analyze the scalability and efficiency of algorithms.
Note: Big O notation represents the worst-case scenario of an algorithm. Let’s assume that O(1) and O(n) are the worst-case scenarios of the example above.
Reference : http://carlcheo.com/compsci
Upvotes: 11
Reputation: 403
This is a very simplified explanation, but I hope it covers most important details.
Let's say your algorithm dealing with the problem depends on some 'factors', for example let's make it N and X.
Depending on N and X, your algorithm will require some operations, for example in the WORST case it's 3(N^2) + log(X)
operations.
Since Big-O doesn't care too much about constant factor (aka 3), the Big-O of your algorithm is O(N^2 + log(X))
. It basically translates 'the amount of operations your algorithm needs for the worst case scales with this'.
Upvotes: 7
Reputation: 652
I've more simpler way to understand the time complexity he most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
statement;
Is constant. The running time of the statement will not change in relation to N
for ( i = 0; i < N; i++ )
statement;
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
for ( i = 0; i < N; i++ )
{
for ( j = 0; j < N; j++ )
statement;
}
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
while ( low <= high )
{
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid - 1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
void quicksort ( int list[], int left, int right )
{
int pivot = partition ( list, left, right );
quicksort ( list, left, pivot - 1 );
quicksort ( list, pivot + 1, right );
}
Is N * log ( N ). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they're not nearly as common. Big O notation is described as O ( ) where is the measure. The quicksort algorithm would be described as O ( N * log ( N ) ).
Note: None of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it's also more complex that I've shown. There are also other notations such as big omega, little o, and big theta. You probably won't encounter them outside of an algorithm analysis course.
Upvotes: 12
Reputation:
If you have a suitable notion of infinity in your head, then there is a very brief description:
Big O notation tells you the cost of solving an infinitely large problem.
And furthermore
Constant factors are negligible
If you upgrade to a computer that can run your algorithm twice as fast, big O notation won't notice that. Constant factor improvements are too small to even be noticed in the scale that big O notation works with. Note that this is an intentional part of the design of big O notation.
Although anything "larger" than a constant factor can be detected, however.
When interested in doing computations whose size is "large" enough to be considered as approximately infinity, then big O notation is approximately the cost of solving your problem.
If the above doesn't make sense, then you don't have a compatible intuitive notion of infinity in your head, and you should probably disregard all of the above; the only way I know to make these ideas rigorous, or to explain them if they aren't already intuitively useful, is to first teach you big O notation or something similar. (although, once you well understand big O notation in the future, it may be worthwhile to revisit these ideas)
Upvotes: 11
Reputation:
Big O notation is a way of describing how quickly an algorithm will run given an arbitrary number of input parameters, which we'll call "n". It is useful in computer science because different machines operate at different speeds, and simply saying that an algorithm takes 5 seconds doesn't tell you much because while you may be running a system with a 4.5 Ghz octo-core processor, I may be running a 15 year old, 800 Mhz system, which could take longer regardless of the algorithm. So instead of specifying how fast an algorithm runs in terms of time, we say how fast it runs in terms of number of input parameters, or "n". By describing algorithms in this way, we are able to compare the speeds of algorithms without having to take into account the speed of the computer itself.
Upvotes: 23
Reputation: 4071
Big O
f(x) = O(g(x)) when x goes to a (for example, a = +∞) means that there is a function k such that:
f(x) = k(x)g(x)
k is bounded in some neighborhood of a (if a = +∞, this means that there are numbers N and M such that for every x > N, |k(x)| < M).
In other words, in plain English: f(x) = O(g(x)), x → a, means that in a neighborhood of a, f decomposes into the product of g and some bounded function.
Small o
By the way, here is for comparison the definition of small o.
f(x) = o(g(x)) when x goes to a means that there is a function k such that:
f(x) = k(x)g(x)
k(x) goes to 0 when x goes to a.
Examples
sin x = O(x) when x → 0.
sin x = O(1) when x → +∞,
x2 + x = O(x) when x → 0,
x2 + x = O(x2) when x → +∞,
ln(x) = o(x) = O(x) when x → +∞.
Attention! The notation with the equal sign "=" uses a "fake equality": it is true that o(g(x)) = O(g(x)), but false that O(g(x)) = o(g(x)). Similarly, it is ok to write "ln(x) = o(x) when x → +∞", but the formula "o(x) = ln(x)" would make no sense.
More examples
O(1) = O(n) = O(n2) when n → +∞ (but not the other way around, the equality is "fake"),
O(n) + O(n2) = O(n2) when n → +∞
O(O(n2)) = O(n2) when n → +∞
O(n2)O(n3) = O(n5) when n → +∞
Here is the Wikipedia article: https://en.wikipedia.org/wiki/Big_O_notation
Upvotes: 20
Reputation: 7738
Big O is a measure of how much time/space an algorithm uses relative to the size of its input.
If an algorithm is O(n) then the time/space will increase at the same rate as its input.
If an algorithm is O(n2) then the time/space increase at the rate of its input squared.
and so on.
Upvotes: 39