Reputation: 51
I am doing a python project for my Intro to CSC class. We are given a .txt file that is basically 200,000 lines of single words. We have to read in the file line by line, and count how many times each letter in the alphabet appears as the first letter of a word. I have the count figured out and stored in a list. But now I need to print it in the format
"a:10,898 b:9,950 c:17,045 d:10,596 e:8,735
f:11,257 .... "
Another aspect is that it has to print 5 of the letter counts per line, as I did above.
This is what I am working with so far...
def main():
file_name = open('dictionary.txt', 'r').readlines()
counter = 0
totals = [0]*26
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in file_name:
for n in range(0,26):
if i.startswith(alphabet[n]):
totals[n] = totals[n]+1
print(totals)
main()
This code currently outputs
[10898, 9950, 17045, 10675, 7421, 7138, 5998, 6619, 6619, 7128, 1505, 1948, 5393, 10264, 4688, 6079, 15418, 890, 10790, 20542, 9463, 5615, 2924, 3911, 142, 658]
Upvotes: 1
Views: 5187
Reputation: 10355
python 3.4
the solution is to read the line of the file into words variable below in cycle and use Counter
from collections import Counter
import string
words = 'this is a test of functionality'
result = Counter(map(lambda x: x[0], words.split(' ')))
words = 'and this is also very cool'
result = result + Counter(map(lambda x: x[0], words.split(' ')))
counters = ['{letter}:{value}'.format(letter=x, value=result.get(x, 0)) for x in string.ascii_lowercase]
if you print counters:
['a:3', 'b:0', 'c:1', 'd:0', 'e:0', 'f:1', 'g:0', 'h:0', 'i:2', 'j:0', 'k:0', 'l:0', 'm:0', 'n:0', 'o:1', 'p:0', 'q:0', 'r:0', 's:0', 't:3', 'u:0', 'v:1', 'w:0', 'x:0', 'y:0', 'z:0']
Upvotes: 0
Reputation: 9105
I would highly recommend using a dictionary to store the counts. It will greatly simplify your code, and make it much faster. I'll leave that as an exercise for you since this is clearly homework. (other hint: Counter
is even better). In addition, right now your code is only correct for lowercase letters, not uppercase ones. You need to add additional logic to either treat uppercase letters as lowercase ones, or treat them independently. Right now you just ignore them.
Having said that, the following will get it done for your current format:
print(', '.join('{}:{}'.format(letter, count) for letter, count in zip(alphabet, total)))
zip
takes n lists and generates a new list of tuples with n elements, with each element coming from one of the input lists. join
concatenates a list of strings together using the supplied separator. And format
does string interpolation to fill in values in a string with the provided ones using format specifiers.
Upvotes: 1