## Tuesday, December 27, 2016

### Big Numbers

The human brain is ill-equipped to deal with large numbers. Drawing attention to large numbers is only partially effective. For instance, the following ticker in NYC displays the current US National Debt:

 The US National Debt clock at Union Square, NYC

That number is so big it becomes abstract and meaningless. In fact, the above picture is already from a while back. By the time you read this, the actual US National Debt is closer to \$20 trillion. A number so large it becomes hard to parse, especially as the image above is missing commas.

Maybe it helps if we write down the current amount, as of writing of this text, in words and compare it with another large country:

CountryDebt in DollarsDebt in Words/Speech
USA\$19,859,586,951,270Nineteen trillion, eight hundred fifty-nine billion, five hundred eighty-six million, nine hundred fifty-one thousand, two hundred seventy dollars.
Russia\$152,205,694,374One hundred fifty-two billion, two hundred five million, six hundred ninety-four thousand, three hundred seventy-four dollars.

If we actually do round off the US debt to \$20 billion, the Russian debt is actually the rounding error. That's how large the US number really is.

Payback Time

To get a feeling for the amount, what if you would spend 100% of your current net income to pay off the debt, how many years would it take you to pay off the US National Debt? For the average US citizen, it would take 525 million working years. If we combine the entire US workforce at roughly 155 million people and spend all their income on relieving the US debt, it would still take an entire Trump presidency to pay off the debt. This is not realistic for a few reasons, of course.

Even if each person in the US workforce would spend a modest 10% of their net income on paying off the national debt, it would take the entire country 34 years to pay off the current debt amount, assuming the US government was able to not grow the debt even larger.

Mind-boggling numbers.

Visualizing Large Amounts of Money

Visualizing the debt in the shape of \$100 bills may help. Here is the US National Debt in 2013, compared to an American Football field and Miss Liberty:

 US National Debt visualized in stacks of \$100 bills (link)

For reference, this is what one billion dollars looks like in stacks of \$100 bills:

It is rumored that Pablo Escobar lost \$2.3 billion each year due to rats eating the bills. That's 2.3X the amount showing above.

Pictures help. Check out demonocracy.info for many other visualizations of large amounts of money.

Short Scale and Large Scale

The large numbers we talked about so far are huge. But, we cannot even agree on what to call really large numbers. Two different scales to talk about large numbers are in use:
• Long scale. Every new term in the scale is one million times larger than the previous. A billion means a million millions (10^12), a trillion means a million billions (10^18), etc.
• Short scale. Every new term is one thousand times larger than the previous. A billion means a thousand millions (10^9), trillion means a thousand billions (10^12), etc.
For fun, here is what scale is being adopted by different countries across the world, including a couple of exceptions to the rule:

 Use of short and long scale across the world (wikipedia)

If you care, a milliardaire and a billionaire are equally rich. Confusing.

Algorithms and Big Numbers

Large numbers confound us. Reasoning about large numbers is even harder. So how can we make the analysis of algorithms more insightful? Applying scales helps. Writing out large numbers in words may help. Visualization definitely helps.

Say we have a certain algorithm that we run on a modest number of 50 elements. The algorithm has implementations with different time complexity. How efficient is each algorithm? How well does it scale when we increase the number of elements? Those are questions we care about when scaling to thousands, millions, or even billions of users.

For each class of complexity, the required operations for 50 elements are at the following scale:

ComplexityOperationsNumber of Operations in Words/Speech
O(1)1One
O(log n)4Four
O(n)50Fifty
O(n log n)195One hundred ninety-five
O(n^2)2 500Two thousand, five hundred
O(2^n)1 125 899 906 842 624One quadrillion, one hundred twenty-five trillion, eight hundred ninety-nine billion, nine hundred six million, eight hundred forty-two thousand, six hundred twenty-four
O(factorial)30 414 093 201 713 378 043 612 608 166 064 768 844 377 641 568 960 512 000 000 000 000Thirty vigintillion, four hundred fourteen novemdecillion, ninety-three octodecillion, two hundred one septendecillion, seven hundred thirteen sexdecillion, three hundred seventy-eight quindecillion, forty-three quattuordecillion, six hundred twelve tredecillion, six hundred eight duodecillion, one hundred sixty-six undecillion, sixty-four decillion, seven hundred sixty-eight nonillion, eight hundred forty-four octillion, three hundred seventy-seven septillion, six hundred forty-one sextillion, five hundred sixty-eight quintillion, nine hundred sixty quadrillion, five hundred twelve trillion

Those numbers may still not mean much. But, just read out the value for O(2^n). Does sound silly, right? The number for O(factorial) actually sounds laughable. This is what humans do with large numbers. Our brain does not know what to do with them, gives up, and turns it into a joke. BTW. Who made up those names? Vigintillion? Novemdecilion? Undecillion?

Over their training and/or career, most software engineers have developed an intuitive feeling for complexity theory. They know that in interview questions, proposing an implementation that is O(n^2) means you fell for the trap set by the interviewer. There must be a way to use a hashmap somewhere to get the cost down to O(n) or even O(1).

But does the candidate really understand why O(n^2) is so bad? How do O(n) and O(n^2) relate to each other as the number of elements grows? Let's try a visualization (click the play icon to replay):

In the interactive chart above, the number of elements are between 1 and 50 and are plotted from left to right. The vertical scale shows the number of operations, capped at 800. O(log n), O(n), and even O(n log n) stay linear for a long time. The big P refers to algorithms that can be solved in polynomial time, while NP refers to nondeterministic polynomial time.

When interviewing, you definitely do not want to be nondeterministic. Already for n=10, the purple line for O(n^2) is off the scale in the above chart. Search for solutions that are O(n), as is the case in linear search, or O(log n), the number of steps needed in binary search, or ideally O(1), when using a hashtable.