Reputation: 61
I am a beginner python programmer and I am trying to make a program which counts the numbers of letters in a text file. Here is what I've got so far:
import string
text = open('text.txt')
letters = string.ascii_lowercase
for i in text:
text_lower = i.lower()
text_nospace = text_lower.replace(" ", "")
text_nopunctuation = text_nospace.strip(string.punctuation)
for a in letters:
if a in text_nopunctuation:
num = text_nopunctuation.count(a)
print(a, num)
If the text file contains hello bob
, I want the output to be:
b 2
e 1
h 1
l 2
o 2
My problem is that it doesn't work properly when the text file contains more than one line of text or has punctuation.
Upvotes: 6
Views: 36269
Reputation: 136515
Yet another way:
import sys
from collections import defaultdict
read_chunk_size = 65536
freq = defaultdict(int)
for c in sys.stdin.read(read_chunk_size):
freq[ord(c.lower())] += 1
for symbol, count in sorted(freq.items(), key=lambda kv: kv[1], reverse=True):
print(chr(symbol), count)
It outputs the symbols most frequent to the least.
The character counting loop is O(1) complexity and can handle arbitrarily large files because it reads the file in read_chunk_size
chunks.
Upvotes: 0
Reputation: 1
import sys
def main():
try:
fileCountAllLetters = file(sys.argv[1], 'r')
print "Count all your letters: ", len(fileCountAllLetters.read())
except IndexError:
print "You forget add file in argument!"
except IOError:
print "File like this not your folder!"
main()
python file.py countlettersfile.txt
Upvotes: -1
Reputation: 414865
You could split the problem into two simpler tasks:
#!/usr/bin/env python
import fileinput # accept input from stdin and/or files specified at command-line
from collections import Counter
from itertools import chain
from string import ascii_lowercase
# 1. count frequencies of all characters (bytes on Python 2)
freq = Counter(chain.from_iterable(fileinput.input())) # read one line at a time
# 2. print frequencies of ascii letters
for c in ascii_lowercase:
n = freq[c] + freq[c.upper()] # merge lower- and upper-case occurrences
if n != 0:
print(c, n)
Upvotes: 0
Reputation: 82949
Just for the sake of completeness, if you want to do it without using Counter
, here's another very short way, using list comprehension and the dict
builtin:
from string import ascii_lowercase as letters
with open("text.txt") as f:
text = f.read().lower()
print dict((l, text.count(l)) for l in letters)
f.read()
will read the content of the entire file into the text
variable (might be a bad idea, if the file is really large); then we use a list comprehension to create a list of tuples (letter, count in text)
and convert this list of tuples to a dictionary. With Python 2.7+ you can also use {l: text.count(l) for l in letters}
, which is even shorter and a bit more readable.
Note, however, that this will search the text multiple times, once for each letter, whereas Counter
scans it only once and updates the counts for all the letters in one go.
Upvotes: 1
Reputation: 727
import string
fp=open('text.txt','r')
file_list=fp.readlines()
print file_list
freqs = {}
for line in file_list:
line = filter(lambda x: x in string.letters, line.lower())
for char in line:
if char in freqs:
freqs[char] += 1
else:
freqs[char] = 1
print freqs
Upvotes: 2
Reputation:
Using re:
import re
context, m = 'some file to search or text', {}
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in range(len(letters)):
m[letters[i]] = len(re.findall('{0}'.format(letters[i]), context))
print '{0} -> {1}'.format(letters[i], m[letters[i]])
It is much more elegant and clean with Counter nonetheless.
Upvotes: 1
Reputation: 41003
This is very readable way to accomplish what you want using Counter:
from string import ascii_lowercase
from collections import Counter
with open('text.txt') as f:
print Counter(letter for line in f
for letter in line.lower()
if letter in ascii_lowercase)
You can iterate the resulting dict to print it in the format that you want.
Upvotes: 12
Reputation: 10288
You have to use collections.Counter
from collections import Counter
text = 'aaaaabbbbbccccc'
c = Counter(text)
print c
It prints:
Counter({'a': 5, 'c': 5, 'b': 5})
Your text
variable should be:
import string
text = open('text.txt').read()
# Filter all characters that are not letters.
text = filter(lambda x: x in string.letters, text.lower())
For getting the output you need:
for letter, repetitions in c.iteritems():
print letter, repetitions
In my example it prints:
a 5
c 5
b 5
For more information Counters doc
Upvotes: 1