Notes: Big O Notation

The motivation behind Big-O notation is that we need some way
to measure the efficiency of programs but there are so many
differences between computers that we can't use real time to
measure program efficiency. Therefore we use a more abstract
concept to measure **algorithm efficiency**.

Type of computer | Time |
---|---|

Home PC | 51.915 |

Desktop PC | 11.508 |

Minicomputer | 2.382 |

Mainframe | 0.431 |

Supercomputer | 0.087 |

Certain standard programs that are used to compare the speed of
different computers are known as **benchmarks**. There are
collections of such programs that are used for exactly that purpose.

So we expect programs to take longer to run on a PC than on a supercomputer. What if one sorting algorithm takes 0.5 seconds to run on a PC and another sorting algorithm takes 0.5 seconds to run on a supercomputer? How will we be able to decide which algorithm is more efficient?

Similarly, the time an algorithm takes varies according to the number of items processed. In this example we measure the time it takes to sort a variable number of items on two different types of computers.

Array Size, N | Home Computer | Desktop Computer |
---|---|---|

125 | 12.5 | 2.8 |

250 | 49.3 | 11.0 |

500 | 195.8 | 43.4 |

1000 | 780.3 | 172.9 |

2000 | 3114.9 | 690.5 |

Run times will also vary depending on the programming language that's used and on how good the compiler is. So there are many factors that affect the overall runtime of an algorithm.

Nevertheless, if you plotted the two sets of data from the
above table, both curves would have the same basic shape -- that is,
the shape of a **second-order polynomial** or a **quadratic equation**.
A quadratic equation has the following general form:

an^{2}+ bn + c

The only real difference in the two equations occurs in the values of
*a*, *b*, and *c*.

Let's look again at the general quadratic equation, this time with numeric values for a, b, and c:

f(n) = 0.0001724n^{2}+ 0.0004n + 0.1

Let's think about what happens when *n* gets very, very large.
The larger *n* gets, the more *n ^{2}* contributes to the overall value
of the expressions, as the following table shows.

N | f(n) | an^{2} | an^{2} as % of total |
---|---|---|---|

125 | 2.8 | 2.7 | 94.7 |

250 | 11.0 | 10.8 | 98.2 |

500 | 43.4 | 43.1 | 99.3 |

1000 | 172.9 | 172.4 | 99.7 |

2000 | 690.5 | 689.6 | 99.9 |

As this table illustrates, as *n* grows large, the n^{2}
terms completely dominates the expression. Big-O notation makes use
of **dominant terms** like this to create several classes of
functions that serve as general measures of algorithm efficiency.

To form a Big-O expression, just identify the dominant term in
the function and throw away the other terms. Thus, we can throw
away the *n* term and the constant *c* terms. We can
even throw away the coefficient *a* since this merely
effects how quickly the quadratic terms grows, not its overall
shape. For example, no matter how small the coefficient is,
it will not turn the quadratic equation into a linear equation.
And no matter how big it is, it will not turn the quadratic into
an exponentional function.

The following table summarizes the different classes by which computer scientists classify algorithms.

Class of Function | O-Notation |
---|---|

Constant | O(1) |

Logarithmic | O(log N) |

Linear | O(N) |

n log n | O(n log n) |

Quadratic | O(N^{2}) |

Cubic | O(N^{3}) |

Exponential | O(2^{N}) |

Exponential | O(10^{N}) |

Now let's compare running time of an algorithm, in
**microseconds** and other time units, for different values of N,
given these different classes of functions.

f(N) | N = 2 | N=1024 (microsecs) | N=1024 (other units) |
---|---|---|---|

1 | 1 | 1 | 1 microsec |

log_{2}N | 1 | 10 | 10 microsecs |

N | 2 | 1.02 x 10^{3} | 1.02 millisec |

N log_{2}N | 2 | 1.02 x 10^{4} | 10.2 millisecs |

N^{2} | 4 | 1.05 x 10^{6} | 1.05 secs |

N^{3} | 8 | 1.07 x 10^{9} | 17.9 mins |

2^{N} | 4 | 1.80 x 10^{308} | 5.7 x 10^{294} yrs |

What this is saying is that for high enough values of *n* f(n)
is **bounded** by K x g(n). The absolute value is brought in to
account for negative functions -- i.e., those that lie below the
horizontal axis.

Here's a graphical representation of this claim: