The motivation behind Big-O notation is that we need some way to measure the efficiency of programs but there are so many differences between computers that we can't use real time to measure program efficiency. Therefore we use a more abstract concept to measure algorithm efficiency.
| Type of computer | Time |
|---|---|
| Home PC | 51.915 |
| Desktop PC | 11.508 |
| Minicomputer | 2.382 |
| Mainframe | 0.431 |
| Supercomputer | 0.087 |
Certain standard programs that are used to compare the speed of different computers are known as benchmarks. There are collections of such programs that are used for exactly that purpose.
So we expect programs to take longer to run on a PC than on a supercomputer. What if one sorting algorithm takes 0.5 seconds to run on a PC and another sorting algorithm takes 0.5 seconds to run on a supercomputer? How will we be able to decide which algorithm is more efficient?
Similarly, the time an algorithm takes varies according to the number of items processed. In this example we measure the time it takes to sort a variable number of items on two different types of computers.
| Array Size, N | Home Computer | Desktop Computer |
|---|---|---|
| 125 | 12.5 | 2.8 |
| 250 | 49.3 | 11.0 |
| 500 | 195.8 | 43.4 |
| 1000 | 780.3 | 172.9 |
| 2000 | 3114.9 | 690.5 |
Run times will also vary depending on the programming language that's used and on how good the compiler is. So there are many factors that affect the overall runtime of an algorithm.
Nevertheless, if you plotted the two sets of data from the above table, both curves would have the same basic shape -- that is, the shape of a second-order polynomial or a quadratic equation. A quadratic equation has the following general form:
an2 + bn + c
The only real difference in the two equations occurs in the values of a, b, and c.
Let's look again at the general quadratic equation, this time with numeric values for a, b, and c:
f(n) = 0.0001724n2 + 0.0004n + 0.1
Let's think about what happens when n gets very, very large. The larger n gets, the more n2 contributes to the overall value of the expressions, as the following table shows.
| N | f(n) | an2 | an2 as % of total |
|---|---|---|---|
| 125 | 2.8 | 2.7 | 94.7 |
| 250 | 11.0 | 10.8 | 98.2 |
| 500 | 43.4 | 43.1 | 99.3 |
| 1000 | 172.9 | 172.4 | 99.7 |
| 2000 | 690.5 | 689.6 | 99.9 |
As this table illustrates, as n grows large, the n2 terms completely dominates the expression. Big-O notation makes use of dominant terms like this to create several classes of functions that serve as general measures of algorithm efficiency.
To form a Big-O expression, just identify the dominant term in the function and throw away the other terms. Thus, we can throw away the n term and the constant c terms. We can even throw away the coefficient a since this merely effects how quickly the quadratic terms grows, not its overall shape. For example, no matter how small the coefficient is, it will not turn the quadratic equation into a linear equation. And no matter how big it is, it will not turn the quadratic into an exponentional function.
The following table summarizes the different classes by which computer scientists classify algorithms.
| Class of Function | O-Notation |
|---|---|
| Constant | O(1) |
| Logarithmic | O(log N) |
| Linear | O(N) |
| n log n | O(n log n) |
| Quadratic | O(N2) |
| Cubic | O(N3) |
| Exponential | O(2N) |
| Exponential | O(10N) |
Now let's compare running time of an algorithm, in microseconds and other time units, for different values of N, given these different classes of functions.
| f(N) | N = 2 | N=1024 (microsecs) | N=1024 (other units) |
|---|---|---|---|
| 1 | 1 | 1 | 1 microsec |
| log2N | 1 | 10 | 10 microsecs |
| N | 2 | 1.02 x 103 | 1.02 millisec |
| N log2N | 2 | 1.02 x 104 | 10.2 millisecs |
| N2 | 4 | 1.05 x 106 | 1.05 secs |
| N3 | 8 | 1.07 x 109 | 17.9 mins |
| 2N | 4 | 1.80 x 10308 | 5.7 x 10294 yrs |
What this is saying is that for high enough values of n f(n) is bounded by K x g(n). The absolute value is brought in to account for negative functions -- i.e., those that lie below the horizontal axis.
Here's a graphical representation of this claim: