Sunday, January 06, 2008
Our species has 10 fingers, and we (mostly) count in base 10. It's easy to learn. Base 10 seems natural, but it hasn't always been the favoured base. The Mayans used base 20 (vigesimal), the Babylonians used base 60 (sexagesimal), and our computers use base 2 (binary). What makes our base 10 so special, and more importantly, is 10 the optimal base we could use? Is there a better one?
First, we have to define what is meant by optimal and better in the context of a numeric base (or radix). The only definition of optimal numeric base that I know is the base at which the area A = (#digits in a given number) * (#unique symbols in base) is minimized for all numbers. That is, the optimal base is where, given all numbers that can be represented, the least "space" to represent them is the least amount of area given by the product of the number of digits in a number and the number of symbols required in that base system. 
Using least "area" as our metric for most optimal, the area is given by A = D * B. B = the base, which is also the number of symbols used in the given base; D = the number of digits which can be modeled with the function D(B,N) = logB(N) (or more exactly, it is the index of the leading digit needed to represent the number). Since over all numbers we also consider numbers approaching infinity, this equation for area doesn't tell us much in its present form, so instead we can work with a derivative of it (pun intended). As any intro calculus student will tell you, the maximum or minimum point of a concave surface (in this case, A = D * B) is where d2A/(dD*dB) = 0. If this can give us a solution independent of N (the value of any given number), then there is an optimal base for representing all numbers.
d2A/(dD*dB) = d/dD ( d/dB ( D(B,N) * B ) )
= d/dD ( D(B,N) + B * d/dB ( D(B,N) )
= d/dD [ D(B,N) + (-1)*D(B,N)/ln(B) ]
= 1 + (-1)/ln(B)
This is the slope of a concave up  curve, where the optimal base occurs at the minimum point and the slope equals zero (0).
1 + (-1)/ln(B) = 0
B = e
It turns out e is the optimal numeric base! I don't find this particularly surprising; e is everywhere. (Not to mention lack of surprise from having previously read the final answer) The problem with this is that we humans tend to use integer bases. Having an irrational base isn't unheard of, but we humans don't have much intuitive use for it. 
Optimal Integer Base
Just as Aiken and Tropp discussed , three (3) is thought to be the most optimal integer numeric base because it is closer to e than two (2).
Is three really the best integer base? We can do better than just saying "it's closer to e than 2". (but to be fair, that's not an unreasonable thing to say) The function is non-linear, which makes the immediate obviousness/truth of the statement questionable.  The original equation for the symbolic-spatial area used by representing any number is a function of the base and the value of the number, so it's hard to use it to study a trend for all numbers. The first derivative (with respect to #digits and base), however, is only a function of the base, and is a lot easier to work with.
A'(B) = 1 - 1/ln(B)
Plotting this function over the domain (2:3) shows not only the optimal base to use (where B = e, at the zero crossing), but just as importantly, it shows the rate of change in the representational area as a function of (change of) Base.
To decide whether 2 or 3 is closer to the optimal base in a more quantitative way, we can integrate A'(B) from 2 to e, and again from e to 3, and compare which of the two areas between the curve and the B-axis is smaller in magnitude. It should be obvious from the curve of A'(B), above, that the area between the curve and the B-axis is considerably smaller between e and 3.  This means that if we use a base of 3, there is less change of the total area required to represent numbers. So, three really is the optimal integer numeric base, assuming the original metric for the definition of optimal is reasonable.
This has implications on more than just representing numbers. Three as an optimal base also gives insight into optimal menus. It provides insight into tree traversal/creation. If we switch to a tertiary number system implemented in silicon, we may be able to implement an equal size of integer processing (e.g. the equivalent of 32 bits, or a max of 4 294 967 296, can be represented within 21 tits of ternary, with a max of 10 460 353 203) in a slightly less amount of silicon. Alternately, you might implement a larger numeric representation in a roughly equivalent silicon area. (I haven't done a formal analysis, myself, but I have seen some papers published on it) All the opinions I recall reading on this, though, have suggested that we're currently too deeply ingrained in binary physical implementations such that switching would probably be a bit of a setback given our 50ish years of optimizing for binary implementations. I suspect that if a ternary implementation really is better, it will only get explored if we hit a wall with Moore's Law for our current setup.
The Second most optimal integer base
We still don't know what the second most optimal integer numeric base is, but it's probably 2 or 4. Again, refer to the same expression of A'(B), but now over the domain (2:4), and do the same integration on either side of the optimal point of B = e.
Visually, it looks rather close, but numerically it's no contest. The area from 2->e is 0.132 and from e->4 is 0.209, so 2 is less of an area change from e and is a more optimal integer base than 4. Four (4) still gets the bronze medal for integer bases, though, and still deserves its props. Not only is base 4 (quaternary) the third most optimal integer numeric base, but it is strongly related to the second most optimal integer base, binary, like octal and hexadecimal. Base 4 is the native numeric base of DNA (it has four characters, A G C T, to represent Adenine, Guanine, Cytosine, and Thymine). Base 4 is still more optimal than the one currently in use, base 10.
Where does base 10 (decimal) rank
Where does decimal (base 10) rank in this scheme? Once more, here is the same curve of A'(B) plotted this time over the domain (2:10).
After finding the order of the first few, the rest fall in line. If you were to integrate in the same way, the ranking for optimal positive integer numeric bases are:
- base 3 (ternary)
- base 2 (binary)
- base 4 (quaternary)
- base 5 (quinary)
- base 6 (senary)
- base 7 (septenary)
- base 8 (octal)
- base 9 (nonary)
- base 10 (decimal)
For those interested in irrational bases , based on these calculations base π is more efficient that base 2, as well, but only by about half as much as base 3.
Burton MacKenZie http://www.burtonmackenzie.com/
 This is not my definition. I'm having a hard time finding where I originally found it. I thought it was in Hardy's "A Mathematician's apology", but I've been combing through that book and come up dry. At the time I read the definition (not the date it was published), the author stated "and this has been known for at least a hundred years". I'm still looking for a real reference. Please let me know if you find one. In the "Computer Oral History Collection, 1969-1973, 1977" from the Smithsonian National Museum of American History on page 140, Howard Aiken says: "...this is a much more efficient system than the binary number system...", to which Henry Tropp replies "Well, theoretically, e is the optimal phase and three is a little bit closer approximation to e than two is".
 On an unrelated tangent, "Concave Up" by Jesse Reklaw is a great collection of Dream comix.
 From Mathworld, "Bergman (1957/58) considered an irrational base, and Knuth (1998) considered transcendental bases. This leads to some rather unfamiliar results, such as equating π to 1 in base π, π = 10π.
 For instance, y(x) = A*x is a linear function. If x=4b, y(x)=y(4b)=A*4b = 4*(A*b) = 4*y(b). Four times its argument results in the function being four times as big.
y(x) = x*x*x is a nonlinear function. If x=4b, y(x)=y(4b)=4^3*b^3 = 64*b^3 = 64*y(b). Four times its argument results in the function being sixty-four times as big! It does not increase linearly.
 I couldn't be bothered to integrate A'(B) over B, as it turned out to be longer than a five minute problem. Luckily engineers are more pragmatic and we don't require an exact value. Inspection of the relative areas under the curve is sufficient. A quick numerical integration (rectangular, 1000 slices per side) gives the results of the area from 2->e is 0.132 and from e->3 is 0.0132. So, there is an order of magnitude less change using base 3 than there is using base 2.