user2752551
user2752551

Reputation: 61

Count letters in a text file

I am a beginner python programmer and I am trying to make a program which counts the numbers of letters in a text file. Here is what I've got so far:

import string 
text = open('text.txt')
letters = string.ascii_lowercase
for i in text:
  text_lower = i.lower()
  text_nospace = text_lower.replace(" ", "")
  text_nopunctuation = text_nospace.strip(string.punctuation)
  for a in letters:
    if a in text_nopunctuation:
      num = text_nopunctuation.count(a)
      print(a, num)

If the text file contains hello bob, I want the output to be:

b 2
e 1
h 1
l 2
o 2

My problem is that it doesn't work properly when the text file contains more than one line of text or has punctuation.

Upvotes: 6

Views: 36269

Answers (8)

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136515

Yet another way:

import sys
from collections import defaultdict

read_chunk_size = 65536

freq = defaultdict(int)
for c in sys.stdin.read(read_chunk_size):
    freq[ord(c.lower())] += 1

for symbol, count in sorted(freq.items(), key=lambda kv: kv[1], reverse=True):
    print(chr(symbol), count)

It outputs the symbols most frequent to the least.

The character counting loop is O(1) complexity and can handle arbitrarily large files because it reads the file in read_chunk_size chunks.

Upvotes: 0

Public Person
Public Person

Reputation: 1

import sys

def main():
    try:
         fileCountAllLetters = file(sys.argv[1], 'r')
         print "Count all your letters: ", len(fileCountAllLetters.read())
    except IndexError:
         print "You forget add file in argument!"
    except IOError:
         print "File like this not your folder!"

main()

python file.py countlettersfile.txt

Upvotes: -1

jfs
jfs

Reputation: 414865

You could split the problem into two simpler tasks:

#!/usr/bin/env python
import fileinput # accept input from stdin and/or files specified at command-line
from collections import Counter
from itertools import chain
from string import ascii_lowercase

# 1. count frequencies of all characters (bytes on Python 2)
freq = Counter(chain.from_iterable(fileinput.input())) # read one line at a time

# 2. print frequencies of ascii letters
for c in ascii_lowercase:
     n = freq[c] + freq[c.upper()] # merge lower- and upper-case occurrences
     if n != 0:
        print(c, n)

Upvotes: 0

tobias_k
tobias_k

Reputation: 82949

Just for the sake of completeness, if you want to do it without using Counter, here's another very short way, using list comprehension and the dict builtin:

from string import ascii_lowercase as letters
with open("text.txt") as f:
    text = f.read().lower()
    print dict((l, text.count(l)) for l in letters)

f.read() will read the content of the entire file into the text variable (might be a bad idea, if the file is really large); then we use a list comprehension to create a list of tuples (letter, count in text) and convert this list of tuples to a dictionary. With Python 2.7+ you can also use {l: text.count(l) for l in letters}, which is even shorter and a bit more readable.

Note, however, that this will search the text multiple times, once for each letter, whereas Counter scans it only once and updates the counts for all the letters in one go.

Upvotes: 1

no1
no1

Reputation: 727

import string
fp=open('text.txt','r')
file_list=fp.readlines()
print file_list
freqs = {}
for line in file_list:
    line = filter(lambda x: x in string.letters, line.lower())
    for char in line:
        if char in freqs:
            freqs[char] += 1
        else:
            freqs[char] = 1

print freqs

Upvotes: 2

user2567070
user2567070

Reputation:

Using re:

import re

context, m = 'some file to search or text', {}
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
for i in range(len(letters)):
  m[letters[i]] = len(re.findall('{0}'.format(letters[i]), context))
  print '{0} -> {1}'.format(letters[i], m[letters[i]])

It is much more elegant and clean with Counter nonetheless.

Upvotes: 1

elyase
elyase

Reputation: 41003

This is very readable way to accomplish what you want using Counter:

from string import ascii_lowercase
from collections import Counter

with open('text.txt') as f:
    print Counter(letter for line in f 
                  for letter in line.lower() 
                  if letter in ascii_lowercase)

You can iterate the resulting dict to print it in the format that you want.

Upvotes: 12

moliware
moliware

Reputation: 10288

You have to use collections.Counter

from collections import Counter
text = 'aaaaabbbbbccccc'
c = Counter(text)
print c

It prints:

Counter({'a': 5, 'c': 5, 'b': 5})

Your text variable should be:

import string
text = open('text.txt').read()
# Filter all characters that are not letters.
text = filter(lambda x: x in string.letters, text.lower())

For getting the output you need:

for letter, repetitions in c.iteritems():
    print letter, repetitions

In my example it prints:

a 5
c 5
b 5

For more information Counters doc

Upvotes: 1

Related Questions