Cos
Cos

Reputation: 129

letter frequency from a string in a file using python

Sample text file:

airport, 2007, 175702
airport, 2008, 173294
request, 2005, 646179
request, 2006, 677820
request, 2007, 697645
request, 2008, 795265
wandered, 2005, 83769
wandered, 2006, 87688
wandered, 2007, 108634
wandered, 2008, 171015

This text file contains a word (ex: 'airport'); a year and the number of times that word was used in that year. What I did was created a class which made the word a key and had the year and occurrences for that year. Now what I want to do is find the number of occurrences of each letter from a to z. This is done by finding how many times each letter in the alphabet occurs in the word then multiply that number but the total number of occurrences of that word plus the same for the other words.

example:

'a'; appears once in both wandered and airport so we get 1(83769+87688+108634+171015) = 451106 total occurrences for 'a' in wandered and 1(175702+173294) = 348996 total occurrences for 'a' in airport which come to a total of 800102 times the letter a has appeared. To find the frequency which 'a' appears we divide 800102 by the total number of letter in all which is 25770183 and that gives a frequency of 0.013047 of the letter 'a'. 'b' and 'c' would be 0.0 since no words are currently using those letters.

this is what i have so far but its not working at all and I am out of ideas:

from wordData import*

def letterFreq(words):
    totalLetters = 0
    letterDict = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,
                  'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}

    for word in words:
        totalLetters += totalOccurances(word,words)*len(word)
        for char in range(0,len(word)):
            for letter in letterDict:
                if letter == word[char]:
                    for year in words[word]:
                        letterDict[letter] += year.count
    for letters in letterDict:
        letterDict[letters] /= totalLetters


    print(letterDict)

def main():
    filename = "data/very_short.csv"
    words = readWordFile(filename)
    letterFreq(words)

    if __name__ == '__main__':
        main()

Upvotes: 2

Views: 735

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180481

If you want the count of all the letters in the file use a collections.Counter dict:

from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        c.update(line.split(",")[0])
    print(c)
Counter({'e': 16, 'r': 12, 'd': 8, 'a': 6, 't': 6, 'n': 4, 'q': 4, 's': 4, 'u': 4, 'w': 4, 'i': 2, 'o': 2, 'p': 2})

To get total just multiply by the times it appears:

from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        word, year, count = line.split()
        c.update(word*int(count))

 print(c["a"] / float(sum(c.values())))

Upvotes: 3

Related Questions