Converging towards Benford’s Law
BY
WengHim Cheung
WITH
Frederick W. Strauch and Steven J. Miller, Advisors
A thesis submitted in partial ful.llment
of the requirements for the
Degree of Bachelor of Arts with Honors
in Physics
WILLIAMS COLLEGE
Williamstown, Massachusetts
2015
Abstract
We build upon previous work on a model for the fragmentation of a conserved quantity. Using new bounds on partition numbers, we re.ne a previous result on the magnitude of the conserved quantity required for Benford’s law to appear. We also study a model for breaking up a conserved quantity via a random bisection process, in which each quantity is randomly split into two pieces at each step. We show that this also leads to a Benford distribution, and we derive an expression for the deviation from the Benford distribution based on which step of the bisection process we are on.
Acknowledgements
I would like to thank Professors Frederick W. Strauch and Steven J. Miller, whose guidance and assistance in this thesis process have been invaluable.
Contents
1 Introduction 1
1.1 Benford’sLaw ................................... 1
1.2 PreviousWork .................................. 2
1.2.1 ThePartitionProcess .......................... 2
1.2.2 TheRandomBisectionProcess ..................... 3
1.3 OurProject .................................... 4
2 Partitions 5
2.1 TheIntegerPartitionModel ........................... 5
2.2 Bounds on PH(X) ................................ 6
2.2.1 AnUpperBound ............................. 6
2.2.2 ALowerBound .............................. 10
2.3 Bounds on (nj) .................................. 16
2.3.1 UpperBound ............................... 16
2.3.2 LowerBound ............................... 17
2.3.3 CombiningOurResults.......................... 19
2.4 InSearchofBenford’sLaw............................ 20
2.4.1 SomePreliminaries ............................ 20
2.4.1.1 (nj 2) ............................... 20
2.4.1.2 Covariance of nj ........................ 22
2.4.2 DigitDistribution............................. 24
2.4.2.1 Pd overOneOrderofMagnitude ............... 26
2.4.2.2 Pd over All xj ......................... 27
3 Random Bisections 30
3.1 TheRandomBisectionModel .......................... 31
3.2 SomeToolsfromComplexAnalysis ....................... 32
3.3 ApproachingBenford ............................... 33
3.3.1 TheIntegralatIn.nity .......................... 35
3.3.2 TheResidueContribution ........................ 38
3.3.3 DeviationfromBenford ......................... 40
4 Conclusion 42
Chapter 1
Introduction
Given a random entry drawn from a large data set, what are the chances that the .rst
1
signi.cant digit is 1? Many people would say 10 ; those who think about it for a bit might realize that 0 cannot be the .rst signi.cant digit and say 19 . But they would be wrong. In
3
fact, for many large data sets, the correct answer is around 10 . This is Benford’s law.
1.1 Benford’s Law
In 1881, Simon Newcomb noticed that the .rst pages of logarithmic tables wore out much faster than the later pages [New81]. He found that “[t]he .rst signi.cant .gure is oftener 1 than any other digit, and the frequency diminishes up to 9.” Over half a century later, in 1938, Frank Benford noticed the same thing [Ben38]. In analyzing a compilation of about 20, 000 .rst digits taken from various sources, he found that, when the original numbers were at least four digits long, the .rst digits followed a logarithmic distribution.
A data set is said to satisfy Benford’s law (base 10) if the .rstdigit frequencies follow
1
Pd = log10 1 + d . (1.1)
Benford’s law shows up in a surprisingly large number of settings. The set of Fibonacci
numbers, for example, is Benford; this might not be very surprising, given that it is a purly mathematical construction. Perhaps somewhat more implausibly, the populations of countries around the world follows a roughly Benford distribution [Lon13]. Even data sets where the data is generated by individuals, such as the set of the most common iPhone passcodes [Lon13], can be roughly Benford.
One important thing to note about Benford’s law is that it is independent of units; that is, when the data is scaled by some constant factor, if it was Benford before, then it will be Benford after. For example, if a set of data measured in feet is Benford, then it will still be Benford measured in meters. This means that Benford’s law is a phenomenon not in how we count things, but in the things we count.
1.2 Previous Work
One place where Benford’s law shows up is in the breaking up of some given quantity. As a physical example of this, we can think of a stick being broken up into smaller pieces. When we break up the stick in a random way into a large number of pieces, we expect to see a distribution of the .rst digits of the sizes of the pieces that is roughly Benford; the more we break up the stick, the closer to Benford we expect to get.
1.2.1 The Partition Process
Based on this general process, Lemons came up with a model for fragmenting a conserved quantity [Lem86]. He found that this model led to Benford’s law in the appropriate limit, and from this, he claimed that any fragmentation of a conserved quantity would lead to a Benford distribution in the piece sizes. However, there were a few details in Lemons’s paper that were unclear; this is what led Iafrate to begin his project [Iaf14].
Iafrate’s goal was to completely start over, creating a model that was a re.nement of Lemons’s model and that would hopefully also lead to Benford’s law. With a wellde.ned model in hand, he also wanted to determine the conditions necessary for a Benford distribution to result. While he was successful in both endeavors, his work suggested that his result for the latter problem could be improved.
In this model, we have a given integer X that is to be partitioned using integers xj chosen from some speci.ed set H = {x1 =1,x2,...,xN }. What we want to .gure out is when the expected number of times xj shows up in a partition is given roughly by
X
nj ˜ , (1.2)
Nxj which we show leads to Benford’s law when the xj are evenly distributed in the limit as the xj range to in.nity. Iafrate had found that a su.cient condition was to have X N2 xN ; (1.3)
in this work, we improve this condition to
N3/2
XxN . (1.4)
1.2.2 The Random Bisection Process
Another process for breaking up a conserved quantity that has been studied is the random bisection process. In this process, a conserved quantity is randomly broken up into two pieces by randomly choosing a bisection ratio from a uniform distribution; each of those pieces is then similarly randomly bisected into two pieces each, which are in turn bisected, and so on. In the limit as we go to in.nitely many bisections, we recover Benford’s law. Of interest in this model is how far from a Benford distribution we can expect to be after only .nitely many bisections. [JKK+09] and [BCGT+13] had previously studied this problem, using the Mellin transform as a tool for studying continuous random variables and presenting this problem as a speci.c application. We look at the problem from a di.erent angle, using complex analysis to develop a formula for the deviation from Benford as a function of the
bisection step we are on. Our end result was a deviation from Benford for the digit A of
8
. 2pik log10 A  e2pik log10 (A+1)
1 e
log10 1+ + n , (1.5)
A 2pi
k=8 2pik1+ k
log 10
k.
=0
which goes to 0 as n goes to in.nity, as expected.
1.3 Our Project
Our .rst goal in this work is to re.ne Iafrate’s result on when Benford’s law would result in a fragmentation process. We also want to be able to say something about how far from Benford we can expect to be for a particular fragmentation process, since a Benford distribution is just the average distribution. Finally, we will study the random bisection process, in particular trying to derive a formula for how far we expect to be from a Benford distribution after .nitely many steps.
Chapter 2
Partitions
In a fragmentation process, one of the most important characteristics is that the total quantity is conserved; that is, the sum of the parts is always the same. When we consider breaking up a stick, this simply means that when the stick is broken, we don’t end up with more stick than we started with and none of the stick is lost. Another thing to take into account in an actual fragmentation process (which, incidentally, also happens to make modeling it easier) is that there is some fundamental unit that constitutes the quantity being fragmented, so that each piece must be a multiple of that unit. A stick, for example, must be broken up into pieces consisting of integer numbers of atoms. Given these two conditions on fragmentation processes, it seems appropriate to look at integer partitions.
This process was previously studied by Iafrate [Iaf14], who, among other things, developed a model for the process and found a lower bound on the size of the conserved quantity relative to other parameters in the problem required to get Benford behavior in the appropriate limit. In this work, we improve upon Iafrate’s bound and carry out the calculations to show how Benford’s law arises in the limit as the model approaches the actual physical process.
2.1 The Integer Partition Model
A partition of some positive integer is a way of writing it as the sum of positive integers. For example, {1, 3, 5, 6} is a partition of 15, as are {5, 5, 5} and {15}. Furthermore, two partitions of an integer that di.er only in the ordering of their terms are considered the same. The integer partition problem asks how many partitions there are of a given integer. This isn’t quite the problem we want to look at, since we might want to put some restrictions on how the fragmentation process can occur. For example, in breaking up a stick, we might want to say that each piece cannot be bigger than a certain size. Thus, what we want to consider is the more general restricted integer partition problem. In this problem, in partitioning a given integer, we are only allowed to use the integers in some speci.ed set.
For our problem, let X denote the integer to be partitioned; X, then, represents the conserved quantity. Let H = {x1,x2,...,xN } be the set of integers we are allowed to use in our partitions. A restricted partition of X is then given by
N
X = nixi, (2.1)
i=1
where the ni are nonnegative integers. Let the number of such partitions be denoted by PH (X). To allow us to draw conclusions that relate to the relative sizes of the elements of H, we order the xj so that x1 0; thus,
3
k
8 4x
x
x = 1 2k(k1) 2 is a minimum. Plugging this into 1 k x + k1 8 1 x is at least 1 2 . Combining this with (2.6), we then have gives k1 2k , which, since k = 2,
Bk1 = 1 Ak 1 2 A + B k
=
ˆ

A
1
2 B
(Ax + B)k1dx. (2.7)
This result will be useful later.
Next, we note that, for x> 0, (Ax + B)k1 is linear if k = 2 and concave up if k> 2. This means that, for each positive y,(Ay +B)k1 is at most the average value of (Ax+B)k1 over any interval of x centered at y that contains only positive values of x. In particular, for any positive integer r, we have
1
2
r+
ˆ
(Ar + B)k1 =
(Ax + B)k1dx. (2.8)
r
1
2
We are now ready to prove (2.5). We begin by separating out the r = 0 term in our sum, giving
tt
(Ar + B)k1 = Bk1 +(Ar + B)k1 . (2.9) r=0 r=1
Using (2.7) on the .rst term and (2.8) on each of the terms in the sum, we then have
11
t
Bk1
+(Ar + B)k1 =
ˆ
t
r+
ˆ
22
(Ax + B)k1dx + (Ax + B)k1dx
B
1

r
r=1 r=1
A
2
1
2
=
ˆ
t+
(Ax + B)k1dx

AB
k
11
= At ++ B. (2.10)
Ak 2
Combining (2.9) and (2.10) then gives us (2.5).
We now begin the proof of the upper bound for PH (X). The basic idea is that we want to .gure out the maximum number of copies of xN that can .t into X, then for each possible number of copies of xN that can be used in a partition of X determine the maximum number of copies of xN1 that can also be used in a partition, and so on; when we get to looking at x1, of course, we only get a valid partition if we use the maximum number of copies of x1 that .t. Following this line of reasoning, we de.ne, for 2 = k = N,
lLkJlL2J
Sk = ··· , (2.11)
nk=0 n2=0
where
N
N
X  nixi i=k+1
Lk = . (2.12) xk
Here, Lk is just the maximum number of copies of xk that can be used in a partition of X that also contains nN copies of xN , nN1 copies of xN1, and so on, down to nk+1 copies of xk+1. Note that, by rearranging terms, we get
xk(Lk  nk)
Lk1 = . (2.13) xk1
Sk is just the total number of allowed partitions given that certain numbers of copies of xN through xk+1 have been speci.ed. With this notation, PH (X) is just SN . (If N = 1, then
(2.4) gives PH (X) = 1, which is certainly true, as in that case, there is exactly one way to
partition X, namely by using X copies of x1 = 1.) We want to show that
k1k
11 1
Sk = xkLk + x2 + xi , (2.14)
(k  1)! k1 xi 2
i=1 i=3
as taking k = N and plugging in X
LN = (2.15) xN
will give us (2.4). We proceed by induction. As our base case, we take k = 2. This gives us
lL2J
11
S2 == L2 +1 = L2 +1= (x2L2 + x2) . (2.16)
(2  1)! x1x2
n2=0
Now, while this satis.es (2.14) for k = 2, it almost seems to do so somewhat trivially, since the sum on the right side of the inequality is an empty sum. We therefore prove the k =3 case next; this will also serve to introduce the technique we will use in the induction step. We have
lL3JlL3J
S3 = S2 = (L2 + 1). (2.17) n3=0 n3=0
Using (2.13), this becomes
lL3JlL3J
x3 x2
(L2 +1) = L3  n3 + . (2.18) x2 x3
n3=0 n3=0
Next, we set j = L3  n3, so that our inequality becomes
lL3J
x3 x2
S3 = j + L3  L3 + . (2.19) x2 x3
j=0
+ x2
We then use (2.5), with A = 1 and B = L3  L3 x3 . This gives us
1 x3 1 x22
S3 = L3 ++ L3  L3 +
2 x2 2 x3 31
3
11 1
= x3L3 + x2 + xi , (2.20)
(3  1)! 3 xi 2
i=1 i=3
proving the k = 3 case. We now proceed to the induction step. Suppose that (2.14) holds for k = f  1. We want to show that it also holds for k = f. We have
lLeJ
Sc = Sc1
ne=0
c2
lLeJ c1
11 1
= c1 xc1Lc1 + x2 + xi
(f  2)! 2
xi
i=1 ne=0 i=3
c2
lLeJ c1
1 x c2 x2 11
= c + xi , (2.21)
c1 Lc  nc +
(f  2)! xc 2 xc
xi
i=1 ne=0 i=3
where we have again used (2.13). Similarly to what we did in the k = 3 case, we now set j = Lc  nc, giving us
c2 leLJ c1 c2
1 xx2 11
c ++ . (2.22)
Sc = (f  2)! c1 xi j + Lc  Lc xc 2 xc xi
i=1 j=0 i=3
We now use (2.5), giving us
c1 c2 c1
1 x11 x2 11
Sc = c Lc ++ + xi
(f  2)! c1 xi f  12 xc 2 xc
i=1 i=3
c1 11 1 c
= c xcLc + x2 + xi . (2.23)
(f  1)! 2
xi
i=1 i=3
Thus, (2.14) holds for all 2 = f = N. Combining this result with (2.15) then gives us (2.4).
2.2.2 A Lower Bound
To .nd a lower bound for PH (X), we will use an approach similar to that used to .nd Colman’s upper bound. First, however, we will develop the lower bound version of (2.5). This turns out to be a very di.erent problem, since we have to be much more careful with our bounds because (Ar + B)k1 is concave up (for k> 2). In fact, we end up with a much messier result than (2.5); in addition to a main term (which is identical to that in (2.5)), we also need some lowerorder terms so that we end up with a bound from below.
Lemma. Let A and B be positive numbers, and let k be an integer at least 2. Then
t (Ar + B)k1 = 1 At +1 + B k
Ak 2
r=0 (2.24) k  11 k2 1 B
Bk1
 A At ++ B +  .
82 2 Ak
B
Proof. First of all, we can pull a factor of Ak1 out of the sum and set C = A , giving us
tt
(Ar + B)k1 = Ak1 (r + C)k1 . (2.25) r=0 r=0
We can then focus on just the sum, inserting the factor of Ak1 at the end. We use the binomial theorem to expand our sum; unlike what we did in the proof of (2.5), we keep every term here. This gives us
t tk1 (r + C)k1 k  1 jC(k1)j
= r
j
r=0 r=0 j=0
k1 t
k  1
C(k1)jj
= r
j
j=0 r=0
k1 t
k  1
C(k1)jj
= Ck1(t + 1) + r. (2.26)
j
j=1 r=0
We .rst deal with the inside sum. We begin by rewriting it as
tt1
11
rj = tj + rj +(r + 1)j. (2.27)
22
r=0 r=0
jj
The term inside the sum is just the average of xevaluated at x = r and xevaluated at x = r + 1. Now, note that xj is either linear or concave up. For such a function f(x), we have
ˆ b
11
f(x)dx = (f(a)+ f(b)) ; (2.28)
b  a a 2
this basically just says that the line between two points on the curve does not lie below the curve. Applying this to xj, we then get
tt1
ˆ r+1
1
j
j =
rt+ sjds
2
r=0 r=0 r
ˆ t
1
j
= t+ sjds
2
0
11
jj+1
= t+ t. (2.29)
2 j +1
This isn’t quite what we want; remember that we are trying to get the lower bound version of
j+1 j1
1 11  j 1
(2.5), so we want something with a t + in it. We consider t + t +.
2 j+12 82
We expand these out using the binomial theorem, giving
j+1 j1
11 j 1 11
j+1 j
t +  t += t+ t
k 2 82 j +1 2
j+1 j1
1 j +1 1 jj  11
(j+1)i  (j1)i
+ tt
j +1 i 2i 8 i 2i
i=2 i=0
11
j+1 j
= t+ t
j +1 2 j1 (2.30)
jj  11 11
(j1)i
 t .
4 i 2i 2(i + 1)(i + 2)
i=0
Consider the term inside the sum. Since i = 0, we have 1  1 = 0, so because the
2(i+1)(i+2)
other terms in the sum are positive, the entire term being summed over i is nonnegative for each i. Thus, the sum as a whole is nonnegative. Since it is being subtracted, we can simply drop it to get an upper bound, giving us
j+1 j1
11 j 1 11
jj+1
t +  t + = t+ t. (2.31)
j +1 282 2 j +1
Applying (2.29) and (2.31) to (2.26), we then have
t
(r + C)k1 =Ck1(t + 1)
r=0 k1 (2.32)
j+1 j1
k  1 11 j 1
C(k1)j
+ t +  t + .
jj +1 2 82
j=1
Rearranging these terms then gives us
t ki
1 11 k 1
(r + C)k1 = Ck1  Ck + Cki t +
2 kk i 2
r=0 i=0
k2 i
k  1 k  21
C(k2)i
 t +
8 i 2
i=0 kk2
1 111 k  11
Ck1  Ck
=+ t ++ C  t ++ C
2 kk 2 82
kk2
11 k  11 1 C
Ck1
= t ++ C  t ++ C +  . (2.33)
k 282 2 k
Substituting back C = B and multiplying through by Ak1 then gives (2.24).
A
We now turn our attention to .nding a lower bound for PH (X). What we will end up showing is
N1N
11 1
PH (X) >X + xi
(N  1)! xj 2
i=4
xj .H
N3 (2.34)
N
2
1 x1
 N X + xi .
8(N  4)! xj 2
i=4
xj .H
This isn’t de.ned for N< 4, as the factorial in the second term would then be unde.ned. Just to be thorough in our solution, if N< 4, we will simply ignore the second term in (2.34). (For N = 1, we have an equality rather than an inequality, as the right side of (2.34), ignoring the second term, is 1.) As in the proof of the upper bound, we will use the quantities Sk and Lk, de.ned for 2 = k = N by (2.11) and (2.12), respectively. As before, with these de.nitions, we have PH (X)= SN . Since the N = 2 and N = 3 cases are di.erent, we will deal with those .rst.
We begin by considering the N = 2 case, working with k = 2 so that we can use our results for that later and only setting N = 2 at the end. We have
lL2J
S2 == L2 +1 >L2. (2.35) n2=0
Setting N = 2, so that L2 = xX 2 , gives us
11
X21
PH (X)= S2 = , (2.36)
(2  1)! x1x2
which is exactly what (2.34) gives.
Let us now consider the N = 3 case, as before by looking at k = 3 and then setting N = 3 at the end. We have, using (2.13),
lL3JlL3JlL3J
x3
S3 = S2 >L2 =(L3  n3). (2.37) x2
n3=0 n3=0 n3=0 Setting j = L3  n3 gives us
lL3J
x3
S3 > (j + L3  L3 ). (2.38) x2
j=0
We now apply (2.24), giving us x3 11 2 x3 1 x3 1 L3  L3
S3 >L3 +  +  (L3  L3 ) . (2.39)
x2 22 x2 8 x2 22 Since 0 = L3  L3 < 1, the last term is nonnegative, so we can drop it. This then gives us x3 11 2 x3 1
S3 >L3 + 
x2 22 x2 8 1 x3
L2
= 3 + L3
2 x2 1 x3
= L32 . (2.40)
2 x2 Setting N = 3, so that L3 = xX 3 , then gives us PH (X)= S3 11
X31
= , (2.41)
(3  1)! x1x2x3
which is exactly what (2.34) gives. Having dealt with the problematic cases, we now take N = 4. We will use induction to show that, for k = 4, we have
k1kk2
1 x 11 Sk > kk 1 Lk + xi
(k  1)! 2 xk
xi
i2 i=4
k3 (2.42) k2
k
1 x 11
 k Lk + xi ;
8(k  4)! k1 2 xk
xi
i2 i=4
setting k = N will then give us (2.34). As our base case, we take k = 4. We have
lL4JlL4J
1 x3
L2
= S3 > (2.43)
S43
2 x2
n4=0 n4=0
Using (2.13), setting j = L4  n4, and then applying (2.24), we get
3
22
1 x11 x1
S4 > 4 L4 +  4 L4 +
6 x2x3 28 x2x3 2
(2.44)
1 x2 1 L4  L4
+ 4  (L4  L4 )2 .
2 x2x3 23
As L4  L4 = 23 , the last term is nonnegative, so we can drop it. This leaves us with
41442
1 x 11
S4 > 4 L4 + xi(4  1)! 41 xi 2 x4
i=2 i=4
43 (2.45)
442
1 x 11
4
 41 L4 + xi ,
8(4  4)! xi 2 x4
i=2 i=4
proving our base case.
We now move on to the induction step. Suppose that (2.42) holds for k = f  1. We want to show that it also holds for k = f. We have
lLeJ
Sc = Sc1
ne=0
c2
c3 lLeJ c1
1 x 11
c1
> c2 Lc1 + xi
(f  2)! 2 xc1
xi
i=2 ne=0 i=4 c4
lLeJ c1
c3
1 x 11
 c1
xi
c2 Lc1 +
8(f  5)! 2
xi xc1
i=2 ne=0 i=4 c2
c3 c2 lLeJ c1
1 xx 11
c1 c
= j + Lc  Lc + xi
c2 c2
(f  2)! x 2 xc
xi
i=2 c1 j=0 i=4
c4 (2.46)
lLeJ c1
c3 c4
1 xx 11
c1 c
 j + Lc  Lc + xi .
c2 c4
8(f  5)! x 2 xc
xi
i=2 c1 j=0 i=4
We now apply (2.24) to the .rst term. Since the second term is being subtracted, we apply
(2.5) to it instead. This gives us
c1 c1c2
1 x 11 1
c
Sc > c1 Lc + xi +
(f  1)! 2 xc 2
xi
i=2 i=4 c3
c1c2
1 x 11 1
 c + xi +
c1 Lc
8(f  3)! 2 xc 2
xi
i=2 i=4
c2 (2.47)
c1 c1
1111 11
+  Lc  Lc + xi Lc  Lc + xi
2 f  12 xc 2 xc
i=4 i=4 c3 1 xc1x cc4 11 c1 1
 c2 Lc + xi + .
8(f  3)(f  5)! 2 xc 2
xi
i=2 i=4
We now look at the third term. Since Lc  Lc is nonnegative, the second half of the term
c1
N
is nonnegative. Next, since the xj are labeled in increasing order, 1 xi xc1 and the fourth term is being subtracted, we can multiply it by
2
xe . This leaves us with
xe1
c1
c2 c
1 x 11
> c Lc +
Scc1 xi
(f  1)! 2 xc
xi
i=2 i=4
c3 (2.48) 11 x c2 11 c
 + c + xi .
c1 Lc
8(f  3)! 8(f  3)(f  5)! 2 xc
xi
i=2 i=4
This matches (2.42) except for the coe.cient of the second term. But
11 1
+ = (1+ f  4)
8(f  3)! 8(f  3)(f  5)! 8(f  3)!f  3
=
8(f  3)! 1
= , (2.49)
8(f  4)!
completing our proof by induction. Taking k = N and using (2.15) then gives us (2.34).
2.3 Bounds on (nj)
Having established bounds on PH (X), we can now go back to looking at nj . To .nd an upper bound, we can use the upper bound for the PH (X  ixj) term in (2.3) and the lower bound for the PH (X) term; similarly, to .nd a lower bound, we can use the lower bound for the PH (X  ixj) term and the upper bound for the PH (X) term.
2.3.1 Upper Bound
We begin by calculating the upper bound. We have
1
nj <
N1 N3
NN NN
111 1 xN2 1
N X + xk  N X + xk
(N1)! 2 k=4 8(N4)! 2 k=4
k=1 xkk=1 xk l X J N1
xj N
11 1
× X  ixj + x2 + xk
(N  1)! N 2
xk
i=1 k=1 k=3
1
=
N1 N3
2
NN (N1)(N2)(N3)xNN
1 N 1
X + xk  X + xk
2 k=4 82 k=4
(2.50)
l X J N1
xj N
1
× X  ixj + x2 + xk .
2
i=1 k=3
From here, since we are looking for an upper bound, and since the term inside the sum is nonincreasing in i, we can replace the sum with an integral from 0 to xX j . The sum then becomes
ˆ X NN1
xj 1
X  rxj + x2 + xk dr
2
0
. k=3 .
NN (2.51)
NN
11 1
..
= X + x2 + xk  x2 + xk .
Nxj 22
k=3 k=3
We can then pull out factors of X from our bound so far, leaving us with
X 1
nj <
N1 N3
2
Nxj 1 NN (N1)(N2)(N3)xN 1 NN
1+ xk  1+ xk
2Xk=4 8X2 2Xk=4
..
NN (2.52)
NN
x2 1 x2 1
..
× 1+ + xk  + xk .
X 2XX 2X
k=3 k=3
At this point, the expression we have is too messy to be of much use to us. However, we can make some simpli.cations by using the assumption that X 1,N. First of all, we can drop the part of the last term that is being subtracted, since it is proportional to X1 N . Next, we Taylor expand each term to .rst order in X 1 . First, however, we rewrite the factor with the terms in the denominator as
1
NNN1 N3
2
1(N1)(N2)(N3)xN 1 NN
1+ xk  1+ xk
2Xk=4 8X2 2Xk=4 NN 1N (2.53)
1+ 1 xk
2Xk=4
= 2 .
(N1)(N2)(N3)x2 NN
1  N 1+ 21 X xk
8X2 k=4
We now Taylor expand and drop small terms, giving us
NN N32
X 1 Nx2 NxN
nj < 1 + (1  N) xk 1+ + xk 1 + ; (2.54)
Nxj 2XX 2X 8X2
k=4 k=3
N
we have the last term from the denominator because although we are assuming that X is small, the fact that N is large prevents us from ignoring the term right now. We then multiply out the terms in the parentheses, again dropping the smaller terms. This leaves us with
N N32
X 1 Nx2 Nx3 x
nj < 1+ xk +++ N . (2.55)
Nxj 2XX 2X 8X2
k=4 As N 1, the sum is approximately just N times the average value of the xk. This gives us
X Nxk Nx2 Nx3 N3xN 2
nj < 1+ +++ . (2.56)
Nxj 2XX 2X 8X2
Now, since xk = xN for all xk . H, each of the middle three terms in the parentheses is N3x2
NxNN
O X . Meanwhile, the last term is O X2 . We thus end up with
N32
X NxN x
nj < 1+ O + O N . (2.57)
Nxj XX2
2.3.2 Lower Bound
We now turn our attention to .nding the lower bound for nj . We have 1
nj >
NNN1
11 1
(N1)! N X + x2 + 2 k=3 xk
k=1 xk
.
l X J
N1
xj N
11 1
× . N X  ixj + xk
(N  1)! 2
xk
i=1 k=1 k=4
.
N3N
2
1 x1
 NN X  ixj + xk .
8(N  4)! 2
xk
k=1 k=4
1N
N
1
= X + x2 + xk
2
k=3
.
l X J
N1
xj N × . X  ixj +1 xk (2.58)
2
i=1 k=4
.
(N  1)(N  2)(N  3)x2 1 NN3
N X  ixj + xk . .
8(N  4)! 2
k=4
As we did with the upper bound, we convert the sum into an integral. This time, however, since we want a lower bound, we integrate from 1 to X . This integral is
xj
.
ˆ X NN1
xj
. X  ixj +1 xk
2
1 k=4
.
N3
(N  1)(N  2)(N  3)xN 2 1 N
X  ixj + xk . di
8(N  4)! 2
k=4 NN
NN
11 11 (2.59)
= X  xj + xk  xk
Nxj 2 Nxj 2
k=4 k=4 N2
(N  1)(N  3)xN 2 1 N
 X  xj + xk
xj 2
k=4 N2
(N  1)(N  3)xN 2 1 N
+ xk .
xj 2
k=4
We can then pull out factors of X from our bound and then, taking advantage of the assumption that X 1,N as before, Taylor expand and drop small terms. This leaves us with
NN N32
Xx2 1  N Nxj NxN
nj > 1 + (1  N)+ xk 1  + xk  , (2.60)
Nxj X 2XX 2XX2
k=3 k=4
which we can expand to .rst order in X 1 to get
N N3
Xx2 1 Nx3 Nxj xN 2
nj > 1 + (1  N)+ xk  . (2.61)
Nxj X 2X XXX2
k=3
The sum includes all of the xk other than x1 and x2, the two smallest xk, so since N 1, it is approximately N times the average value of the xk, giving us N32
Xx2 Nxk Nx3 Nxj xN
nj > 1 + (1  N)+  . (2.62)
Nxj X 2X XXX2
NxN
As before, since xk = xN for all xk . H, each of the middle four terms is O X , while the
N3x
2
last term is O
N
, so we end up with
X2
N32
X NxN xN
nj > 1+ O + O . (2.63)
Nxj XX2
2.3.3 Combining Our Results
Combining our upper and lower bounds, we have
N32
X NxN x
1+ O + O N 0, it can be expressed as
88
f(z)= aj(z  z0)j + aj(z  z0)j, (3.3) j=0 j=1
where both of the series converge in the annulus and converge uniformly in any subannulus r 0 for x< 0, the term in the .rst integral is bounded above by
1. Similarly, since A = 1 and 1  x = 0 for x = 1, the term in the second integral is also bounded above by 1. Then we have
ˆˆˆ
2
= n+1 dx + dx Gy± x<1 x=1
(2k+1)p log 10
ˆ
2
= dx
n+1
(2k+1)p
Gy± log 10
24p 1
= k +
(2k+1)pn+1 log10 2 log 10
log 10 n
=4 . (3.13)
(2k + 1)p
We next look at the integral over Gx+ , which can be described by 2p k + 1 + iy, with
log10 2
2p 12p 1
 k + = y = k + . We have
log10 2 log10 2
ˆˆ
2p (k+ 1 log10 2 )1 = n 2p 2p
Gx+  (k+ 21 ) k + 1 + iy
log 10
log10 2
2p 12pi 1
[1 (k+ )iy] log (A+1) [1 (k+ )iy] log A
log10 2 log10 2
e e
× (3.14)
2p 1
(1 (k+ )iy) log 10
log10 2
e 1
1
× dy .
1  2p k + 1  iy
log10 2
The .rst and third terms are maximized in magnitude when y = 0. In the third term, if we take k = 1, then 2p k + 1 > 2, so that we can drop the 1 in the denominator to upper
log10 2
bound the term in magnitude. As before, the numerator of the second term is bounded above by the sum of the absolute values of the two exponential terms. Once we use this to drop the imaginary components in the exponents, we see that the exponents of the two terms are negative, so the term with A + 1 is bounded above by the term with A. The denominator is minimized, thus maximizing the term, when the exponential term is real and positive. This gives
2p 1
ˆˆ
[1 (k+ )] log A
log10 2
2 e
= dy, (3.15)
n+1 2p 1
[1 (k+ )] log 10
(2k+1)p log10 2
Gx+ Gx+ 1  e
log 10
where the integral is taken from negative y to positive y. Notice that the term inside the
integral is independent of y, so we can pull it out of the integral. The denominator increases
as k increases, while the numerator decreases as k increases, so to get an upper bound, we can set k = 0 and take the maximum value over all the digits A, which occurs when A = 1. This value is ep ˜ 1.76; for purely aesthetic reasons, we will bound this from above with
ep10
2. We then have
ˆˆ
4
= n+1 dy (2k+1)p
Gx+ Gx+ log 10
44p 1
= k +
(2k+1)pn+1 log10 2 log 10
log 10 n
=8 . (3.16)
(2k + 1)p
Finally, we look at the integral over Gx . This line is given by  2p k + 1 + iy, with
log10 2
2p 12p 1
 k + = y = k + . We have
log10 2 log10 2
2p (k+ 1 log10 2 )1
= n 2p 1 2p 1 log10 2
ˆˆ
Gx  (k+ )  k ++ iy
log10 2
2p 12pi 1
[1+ (k+ )iy] log (A+1) [1+ (k+ )iy]log A
log10 2 log10 2
e e
× (3.17)
2p 1
(1+ (k+ )iy) log 10
log10 2
e 1
1
× 2p 1 dy .
1+ k +
log10 2
As before, the .rst and third terms are maximized in magnitude when y = 0. In the denominator of the third term, we can drop the 1 to get an upper bound. As for the second term, as before, we can get an upper bound by taking the sum of the absolute values of the two exponential terms in the numerator; in addition, after we get rid of the imaginary parts, the exponents are never negative, so the term with A is bounded from above by the term with A + 1. This then gives us
2p 1
ˆˆ
[1+ (k+ )] log (A+1)
log10 2
2 e
= dy, (3.18)
n+1 2p 1
[1+ (k+ )] log 10
(2k+1)p log10 2
Gx Gx e 1
log 10
where the integral is taken from negative y to positive y. As with the Gx+ case, the term inside the integral is independent of y, so we can pull it out of the integral. Furthermore, the exponential term in the denominator is always greater than 2, so we can drop the 1 in exchange for an overall factor of 2. This then gives us
2p 1
ˆ(1+ (k+ )) log 10
log10 2
44p 1 A +1
= k + . (3.19)
Gx (2k+1)pn+1 log10 2 10
log 10
Now, A+1 = 1, while the exponent is always greater than 1, so we can upper bound this
10
term by 1. We thus end up with
ˆn
log 10
= 8 . (3.20)
(2k + 1)p
Gx
Now that we have bounded each of the segments of G, we can put our results together. Looking at the integral at in.nity, we have
ˆˆˆˆˆ
=lim + ++
G8 k.8 Gx+ Gy+ Gxi Gy
ˆˆˆˆ
= lim +++
k.8 Gx+ Gy+ Gx Gy
nnnn
log 10 log 10 log 10 log 10
= lim8 +4 +8 +4
k.8 (2k + 1)p (2k + 1)p (2k + 1)p (2k + 1)p log 10 n
= lim 24
k.8 (2k + 1)p =0. (3.21)
Thus, this integral vanishes.
3.3.2 The Residue Contribution
Since the integral at in.nity vanishes, all we need to calculate to get the digit distribution is the residues at the singularities on Re(z) = 1. Combining (3.9) with Cauchy’s residue theorem, we have
8
2pi
pA(Yn)=  Res 1+ k. (3.22)
log 10
k=8
Before we can calculate the residues, however, we .rst need to determine the orders of the poles. Let us take another look at the function we are integrating over,
(1z) log (A+1)  e(1z) log A
1 e1
f(z)= . (3.23)
(1z) log 10  1
zn e1  z
We will .rst consider the poles where k = 0, since the k = 0 case is more complicated. Given that k = 0, the numerator is nonzero, and the part of the denominator that gives 0
(1z) log 10  1
is e. To .nd the orders of these poles, then, we simply need to .nd the orders
(1z) log 10  1
of these roots of e. We know that we have a root of order at least one, so we take a derivative with respect to z. This gives us
(1z) log 10  1
de= 101t log 10, (3.24)
dz
which is never 0. Thus, these poles are simple poles.
Now consider the k = 0 pole, at z = 1. Unlike the poles we just considered, we also get a 0 in the denominator from the (1  z) term. Furthermore, the numerator evaluates to 0. To .nd the order of this pole, then, we must subtract the order of the root in the numerator from the order of the root in the denominator. We begin with the numerator. We know that the order of the root is at least 1, so we take a derivative with respect to z. We have
(1z) log (A+1)  e(1z) log A
de= A1z log A  (A + 1)1z log (A + 1), (3.25)
dz
which, when we set z = 1, gives log AA +1 = 0, so we have a root of order 1. Now consider the denominator. We know that 1 is a root of order at least 2 of (1  z) e(1z) log 10  1 ,sowe take a second derivative of this with respect to z. This gives us
d2
(1z) log 10  1
(1  z) e= 101z log10(2 + log10  z log 10) , (3.26)
dz2
which, when z = 1, is just 2 log 10 = 0, which means that we have a root of order 2 in the denominator. Since 1 was a root of order 1 in the numerator, this means that it is a simple pole. Thus, all of the poles on Rez = 1 are simple poles.
We now proceed to calculate the residues. We begin with the residue at z = 1. Following (3.6), we can cancel a factor of (z  1) to get
(1z) log (A+1)  e(1z) log A
e
Res(1) = lim
(1z) log 10)
z.1 zn (1  e
A1z log A  (A + 1)1z log (A + 1) = lim
z.1 zn1 (n10z + 10(z log 10  n)) 10z
log A
= A+1 log 10 1
=  log10 1+ . (3.27)
A
We now consider the residues at z = 1+ 2pi k, where k = 0. This is slightly more
log 10
complicated, since there is no z  1  2pi k term in the denominator to cancel out. What
log 10
we will do, then, is expand e(1z) log 10  1 out into a Taylor series about 1 + 2pi k as
log 10
(1z) log 10  1= 8 (1)j 2pi j
elogj 10 z  1  k  1
j! log 10
j=0
8 (1)j 2pi j
= logj 10 z  1  k. (3.28)
j! log 10
j=1
We then have
2pi
(1z) log (A+1)  e(1z) log A z  1  k
2pi e
log 10
Res 1+ k = lim
log 10 z.1+ k zn(1  z) (1)j 2pi
log 10
2pi N8logj 10 z  1  k j
j! log 10 j=1 (1z) log (A+1)  e(1z) log A
e1
= lim
z.1+ k zn(z  1) log 10 N (1)j 2pi
log 10
2pi 8logj 10 z  1  k j
(j+1)! log 10
j=0 (1z) log (A+1)  e(1z) log A
e
= lim
2pi
z.1+ k N
log 10 (1)j
zn(z  1)log10 1+ 8logj 10 z  1  2pi k j
(j+1)! log 10 j=1
2pik log10 A  e2pik log10 (A+1)
e
=  n . (3.29)
2pi
2pik 1+ k
log 10
3.3.3 Deviation from Benford
Putting together our results, we have, at the nth step of the bisection process,
82pik log10 A  e2pik log10 (A+1)
1 e
pA(Yn) = log10 1+ + n . (3.30)
A 2pi
k=8 2pik 1+ k
log 10
k=0
The .rst term, which comes from the residue at z = 1, gives us the Benford distribution; the contributions from the other residues, then, gives us the deviation from Benford. That this deviation goes to 0 as n goes to in.nity is not hard to show; we can prove this by bounding the residues in terms of n (other than the one at z = 1, of course, which is independent of n). Looking at (3.29), we see that the two exponential terms in the numerator, having complex exponents, each have magnitude 1; this means that we can bound the numerator from above by 2. As for the denominator, we can drop the 1 in the term in the parentheses. This then gives us
2pi 2
Res 1+ k = n
log10 2pk 2pk
log 10
2 log 10 n+1 1
= . (3.31)
log10 2p kn+1
As log 10
< 1 and k= 1, this goes to 0 as n goes to in.nity, so we do actually get Benford’s
2p
law in the limit.
Chapter 4
Conclusion
In this work, we have studied two di.erent models for breaking apart a conserved quantity. In one model, previously studied by Iafrate [Iaf14], the conserved quantity is broken up into pieces all at once. In the second model, the conserved quantity is broken up in steps, with each piece being divided randomly into two pieces at each step. Our goal was to say something about how quickly the distribution of the .rst digits of the piece sizes approached a Benford distribution.
In the partition model, Iafrate had previously found a requirement on the size of the conserved quantity X relative to the total possible number of piece sizes N and the size of the largest allowed piece xN ; speci.cally, he found that X needed to be much greater than N2xN in order for the error terms to be negligible. Our work in this problem has focused on improving this bound, as Iafrate’s work had suggested that X NxN might be su.cient. Using new bounds on PH (X), the number of ways an integer could be partitioned (with
N3/2
some of our results coming from [Col87]), we were able to show that XxN was good enough, though our methods were unable to produce Iafrate’s conjecture. Future work on this topic should focus either on getting Iafrate’s conjectured result or on disproving it (although Iafrate’s numerical calculations showed that X NxN was su.cient to get Benford behavior for H = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, this H is highly ordered, so this condition may not work for a generic H). From looking at our work on this model, it seems as though any further progress would depend on improving the bounds on PH (X).
As for the random bisection model, we derived a formula for the deviation from a Benford distribution that depended only on which step in the bisection process we were on. Granted, what we ended up with was an in.nite sum, but we were able to show that this deviation goes to 0 as the number of steps goes to in.nity, which is what we expected; this means that even though we do not have a closedform expression for this deviation, we can still calculate it numerically. Future work on this topic could involve extending the model to other probability distributions: what can be said about the digit distribution of the piece sizes if the bisection ratios are chosen from a nonuniform distribution? Another thing to look at would be whether or not the results obtained here, via complex analysis techniques, agree with the Mellin transform results of [JKK+09] and [BCGT+13]; the results obtained look di.erent, but as they deal with the same problem, we expect them to actually be the same.
In our work, we have considered just two methods of breaking up a conserved quantity.
Of course, we expect each process to lead to Benford behavior in the appropriate limit, but as we have already seen in our two cases, di.erent processes can behave quite di.erently, and analyzing di.erent models can be completely di.erent, both in the techniques used and in the results found. Furthermore, in this work, we have constrained our analysis to one dimension; that is, we have only considered how a given length can be broken up. We could, however, extend our models to higher dimensions, considering, for example, how a given area or volume can be broken up. A good project for the future, then, might be to look at all the di.erent ways a conserved quantity can be broken up and study how the dimension of the conserved quantity a.ects the behavior.
Benford’s law applies to a wide variety of problems. Here, we have only considered one of them, and even within that, we have only looked at a small portion of what could have been studied. We have made remarkable progress in this work, but there is still so much more to explore.
References
[Agn02] Geir Agnarsson. On the Sylvester Denumerants for General Restricted Partitions. Congressus Numerantium, 154:49–60, 2002.
[AS68] A. K. Adhikari and B. P. Sarkar. Distribution of Most Signi.cant Digit in Certain Functions Whose Arguments are Random Variables. Sankhya¯, The Indian Journal of Statistics, Series B, 30(1):47–58, 1968.
[BCGT+13] Thealexa Becker, Taylor C. Corcoran, Alex GreavesTunnell, Joseph R. Iafrate, Joy Jing, Steven Miller, Jaclyn D. Por.lio, Ryan Ronan, Jirapat Samranvedhya, and Frederick W. Strauch. Benford’s Law and Continuous Dependent Random Variables. http://arxiv.org/pdf/1309.5603v2.pdf, 2013.
[Ben38] Frank Benford. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78(4):551–572, 1938.
[Col87] W. J. A. Colman. An Upper Bound for the General Restricted Partition Problem. The Fibonacci Quarterly, 25(1):38–44, 1987.
[Iaf14] Joseph R. Iafrate. Benford’s Law and Power Law Behavior in Fragmentation Processes. Undergraduate honors thesis, Williams College, 2014.
[JKK+09] Dennis Jang, Jung Uk Kand, Alex Kruckman, Jun Kudo, and Steven J. Miller. Chains of Distributions, Hierarchical Bayesian Models and Benford’s Law. Journal of Algebra, Number Theory: Advances and Applications, 1(1), 2009.
[Lem86] Don S. Lemons. On the Numbers of Things and the Distributions of First Digits. American Journal of Physics, 54(9):816–817, 1986.
[Lon13] Jason Long. Testing Benford’s Law. http://testingbenfordslaw.com/populationofworld, 2013.
[New81] Simon Newcomb. Note on the Frequency of Use of the Di.erent Digits in Natural Numbers. American Journal of Mathematics, 4(1):39–40, 1881.
[SS03] E. B. Sa. and A. D. Snider. Fundamentals of Complex Analysis. Prentice Hall, Upper Saddle River, NJ, 3rd edition, 2003.