Adnan
Adnan

Reputation: 13

Printing character percent in a text file

I just wrote a function which prints character percent in a text file. However, I got a problem. My program is counting uppercase characters as a different character and also counting spaces. That's why the result is wrong. How can i fix this?

def count_char(text, char):
    count = 0
    for character in text:
        if character == char:
            count += 1
    return count

filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read()

for char in "abcdefghijklmnopqrstuvwxyz":
    perc = 100 * count_char(text, char) / len(text)
    print("{0} - {1}%".format(char, round(perc, 2)))

Upvotes: 1

Views: 1619

Answers (4)

dawg
dawg

Reputation: 104082

You can use a counter and a generator expression to count all letters like so:

from collections import Counter 
with open(fn) as f:
    c=Counter(c.lower() for line in f for c in line if c.isalpha())

Explanation of generator expression:

c=Counter(c.lower() for line in f # continued below


    ^                             create a counter
          ^   ^                   each character, make lower case
                         ^        read one line from the file

 # continued
for c in line if c.isalpha())
    ^                            one character from each line of the file
           ^                     iterate over line one character at a time    
                     ^           only add if a a-zA-Z letter                                               

Then get the total letter counts:

total_letters=float(sum(c.values()))

Then the total percent of any letter is c[letter] / total_letters * 100

Note that the Counter c only has letters -- not spaces. So the calculated percent of each letter is the percent of that letter of all letters.

The advantage here:

  1. You are reading the entire file anyway to get the total count of the character in question and the total of all characters. You might as well just count the frequency of all character as you read them;
  2. You do not need to read the entire file into memory. That is fine for smaller files but not for larger ones;
  3. A Counter will correctly return 0 for letters not in the file;
  4. Idiomatic Python.

So your entire program becomes:

from collections import Counter 
with open(fn) as f:
    c=Counter(c.lower() for line in f for c in line if c.isalpha())
total_letters=float(sum(c.values()))
for char in "abcdefghijklmnopqrstuvwxyz":
    print("{} - {:.2%}".format(char, c[char] / total_letters))

Upvotes: 1

Vinícius Figueiredo
Vinícius Figueiredo

Reputation: 6518

You should try making the text lower case using text.lower() and then to avoid spaces being counted you should split the string into a list using: text.lower().split(). This should do:

def count_char(text, char):
    count = 0
    for word in text.lower().split():  # this iterates returning every word in the text
        for character in word:   # this iterates returning every character in each word
            if character == char:
                count += 1
    return count

filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read()

totalChars = sum([len(i) for i in text.lower().split()]

for char in "abcdefghijklmnopqrstuvwxyz":
    perc = 100 * count_char(text, char) / totalChars
    print("{0} - {1}%".format(char, round(perc, 2)))

Notice the change in perc definition, sum([len(i) for i in text.lower().split()] returns the number of characters in a list of words, len(text) also counts spaces.

Upvotes: 1

Neil
Neil

Reputation: 14321

You can use the built in .count function to count the characters after converting everything to lowercase via .lower. Additionally, your current program doesn't work properly as it doesn't exclude spaces and punctuation when calling the len function.

import string
filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read().lower()

chars = {char:text.count(char) for char in string.ascii_lowercase}
allLetters = float(sum(chars.values()))
for char in chars:
    print("{} - {}%".format(char, round(chars[char]/allLetters*100, 2)))

Upvotes: 0

Frank Niessink
Frank Niessink

Reputation: 1611

You want to make the text lower case before counting the char:

def count_char(text, char):
    count = 0
    for character in text.lower():
        if character == char:
            count += 1
    return count

Upvotes: 0

Related Questions