Data Structures and Algorithms
Notes: Big O Notation

Introduction

These notes cover the key concepts involved in Big-O notation. They are taken from Standish, Appendix A, pp. 451-471.

The motivation behind Big-O notation is that we need some way to measure the efficiency of programs but there are so many differences between computers that we can't use real time to measure program efficiency. Therefore we use a more abstract concept to measure algorithm efficiency.

Why We Can't Compare Programs and Machines

Here are some data obtained by running a selection sort algorithm on a variety of different computers to sort 2000 integers
Type of computerTime
Home PC51.915
Desktop PC11.508
Minicomputer2.382
Mainframe0.431
Supercomputer0.087

Certain standard programs that are used to compare the speed of different computers are known as benchmarks. There are collections of such programs that are used for exactly that purpose.

So we expect programs to take longer to run on a PC than on a supercomputer. What if one sorting algorithm takes 0.5 seconds to run on a PC and another sorting algorithm takes 0.5 seconds to run on a supercomputer? How will we be able to decide which algorithm is more efficient?

Similarly, the time an algorithm takes varies according to the number of items processed. In this example we measure the time it takes to sort a variable number of items on two different types of computers.
Array Size, NHome ComputerDesktop Computer
12512.52.8
25049.311.0
500195.843.4
1000780.3172.9
20003114.9690.5

Run times will also vary depending on the programming language that's used and on how good the compiler is. So there are many factors that affect the overall runtime of an algorithm.

Nevertheless, if you plotted the two sets of data from the above table, both curves would have the same basic shape -- that is, the shape of a second-order polynomial or a quadratic equation. A quadratic equation has the following general form:

an2 + bn + c

The only real difference in the two equations occurs in the values of a, b, and c.

The Basic Insight of Big-O

Let's look again at the general quadratic equation, this time with numeric values for a, b, and c:

f(n) = 0.0001724n2 + 0.0004n + 0.1

Let's think about what happens when n gets very, very large. The larger n gets, the more n2 contributes to the overall value of the expressions, as the following table shows.
Nf(n)an2an2 as % of total
1252.82.794.7
25011.010.898.2
50043.443.199.3
1000172.9172.499.7
2000690.5689.699.9

As this table illustrates, as n grows large, the n2 terms completely dominates the expression. Big-O notation makes use of dominant terms like this to create several classes of functions that serve as general measures of algorithm efficiency.

To form a Big-O expression, just identify the dominant term in the function and throw away the other terms. Thus, we can throw away the n term and the constant c terms. We can even throw away the coefficient a since this merely effects how quickly the quadratic terms grows, not its overall shape. For example, no matter how small the coefficient is, it will not turn the quadratic equation into a linear equation. And no matter how big it is, it will not turn the quadratic into an exponentional function.

Classes of Functions

The following table summarizes the different classes by which computer scientists classify algorithms.
Class of FunctionO-Notation
ConstantO(1)
LogarithmicO(log N)
LinearO(N)
n log nO(n log n)
QuadraticO(N2)
CubicO(N3)
ExponentialO(2N)
ExponentialO(10N)

Now let's compare running time of an algorithm, in microseconds and other time units, for different values of N, given these different classes of functions.
f(N)N = 2N=1024 (microsecs)N=1024 (other units)
1111 microsec
log2N11010 microsecs
N21.02 x 1031.02 millisec
N log2N21.02 x 10410.2 millisecs
N241.05 x 1061.05 secs
N381.07 x 10917.9 mins
2N41.80 x 103085.7 x 10294 yrs

Definition of O-Notation

We say that f(n) is O(g(n)) if there exist two positive constants K and n0 such that |f(n)| <= K|g(n)| for all n >= n0.

What this is saying is that for high enough values of n f(n) is bounded by K x g(n). The absolute value is brought in to account for negative functions -- i.e., those that lie below the horizontal axis.

Here's a graphical representation of this claim: