user2240033
user2240033

Reputation: 71

Benford's law program

I have to write a program that proves Benford's Law for two Data lists. I think I have the code down for the most part but I think there are small errors that I am missing. I am sorry if this is not how the site is supposed to be used but I really need help. Here is my code.

def getData(fileName):

    data = []
    f = open(fileName,'r')
    for line in f:
        data.append(line)
    f.close()

    return data

def getLeadDigitCounts(data):

    counts = [0,0,0,0,0,0,0,0,0]

    for i in data:
        pop = i[1]
        digits = pop[0]
        int(digits)
        counts[digits-1] += 1

    return counts

def showResults(counts):

    percentage = 0
    Sum = 0
    num = 0
    Total = 0

    for i in counts:
        Total += i

    print"number of data points:",Sum
    print
    print"digit number percentage"
    for i in counts:
        Sum += i
        percentage = counts[i]/float(Sum)
        num = counts[i]
        print"5%d 6%d %f"%(i,num,percentage)


def showLeadingDigits(digit,data):

    print"Showing data with a leading",digit
    for i in data:
        if digit == i[i][1]:
            print i

def processFile(name):

    data = getData(name)
    counts = getLeadDigitCounts(data)
    showResults(counts)

    digit = input('Enter leading digit: ')
    showLeadingDigits(digit, data)

def main():

    processFile('TexasCountyPop2010.txt')
    processFile('MilesofTexasRoad.txt')

main()

Again sorry if this is not how I am supposed to use this site. Also, I can only use programming techniques that the professor has showed us so if you could just give me advice to clean up the code as it is I would really appreciate it.

Also, here are a few lines from my data.

Anderson County     58458
Andrews County  14786
Angelina County     86771
Aransas County  23158
Archer County   9054
Armstrong County    1901

Upvotes: 0

Views: 5395

Answers (3)

Gonzalo Franco
Gonzalo Franco

Reputation: 1

Just to share here a different (and maybe more step-by-step) code. It's RUBY.

The thing is, Benford's Law doesn't apply when you have a specific range of random data to extract from. The maximum number of the data set that you are extracting random information from must be undetermined, or infinite.

In other words, say, you used a computer number generator that had a 'set' or specific range from which to extract the numbers, eg. 1-100. You would undoubtedly end up with a random dataset of numbers, yes, but the number 1 would appear as a first digit as often as the number 9 or any other number.

**The interesting** part, actually, happens when you let a computer (or nature) decide randomly, and on each instance, how large you want the random number to potentially be. Then you get a nice, bi-dimensional random dataset, that perfectly attains to Benford's Law. I have generated this RUBY code for you, which will neatly prove that, to our fascination as Mathematicians, Benford's Law works each and every single time!

Take a look at this bit of code I've put together for you!
It's a bit WET, but I'm sure it'll explain.

<-- RUBY CODE BELOW -->

dataset = []

999.times do
  random = rand(999)
  dataset << rand(random)
end

startwith1 = []
startwith2 = []
startwith3 = []
startwith4 = []
startwith5 = []
startwith6 = []
startwith7 = []
startwith8 = []
startwith9 = []

dataset.each do |element|
  case element.to_s.split('')[0].to_i
  when 1 then startwith1 << element
  when 2 then startwith2 << element
  when 3 then startwith3 << element
  when 4 then startwith4 << element
  when 5 then startwith5 << element
  when 6 then startwith6 << element
  when 7 then startwith7 << element
  when 8 then startwith8 << element
  when 9 then startwith9 << element
  end
end

a = startwith1.length
b = startwith2.length
c = startwith3.length
d = startwith4.length
e = startwith5.length
f = startwith6.length
g = startwith7.length
h = startwith8.length
i = startwith9.length

sum = a + b + c + d + e + f + g + h + i

p "#{a} times first digit = 1; equating #{(a * 100) / sum}%"
p "#{b} times first digit = 2; equating #{(b * 100) / sum}%"
p "#{c} times first digit = 3; equating #{(c * 100) / sum}%"
p "#{d} times first digit = 4; equating #{(d * 100) / sum}%"
p "#{e} times first digit = 5; equating #{(e * 100) / sum}%"
p "#{f} times first digit = 6; equating #{(f * 100) / sum}%"
p "#{g} times first digit = 7; equating #{(g * 100) / sum}%"
p "#{h} times first digit = 8; equating #{(h * 100) / sum}%"
p "#{i} times first digit = 9; equating #{(i * 100) / sum}%"

Upvotes: 0

Blender
Blender

Reputation: 298326

Your error is coming from this line:

int(digits)

This doesn't actually do anything to digits. If you want to convert digits to an integer, you have to re-set the variable:

digits = int(digits)

Also, to properly parse your data, I would do something like this:

for line in data:
    place, digits = line.rsplit(None, 1)
    digits = int(digits)
    counts[digits - 1] += 1

Upvotes: 1

Bi Rico
Bi Rico

Reputation: 25823

Lets walk though one cycle of your code and I think you'll see what the problem is. I'll be using this file here for data

An, 10, 22
In, 33, 44
Out, 3, 99

Now getData returns:

["An, 10, 22",
"In, 33, 44",
"Out, 3, 99"]

Now take a look the first pass though the loop:

for i in data:
    # i = "An, 10, 22"
    pop = i[1]
    # pop = 'n', the second character of i
    digits = pop[0]
    # digits = 'n', the first character of pop
    int(digits)
    # Error here, but you probably wanted digits = int(digits)
    counts[digits-1] += 1

Depending on how your data is structured, you need to figure out the logic to extract the digits you expect to get from your file. This logic might do better in the getData funciton, but it mostly depends on the specifics of your data.

Upvotes: 0

Related Questions