Pepe
Pepe

Reputation: 55

Python count of words by word length

I was given a .txt file with a text. I have already cleaned the text (removed punctuation, uppercase, symbols), and now I have a string with the words. I am now trying to get the count of characters len() of each item on the string. Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len() of characters

So far I have:

text = "sample.txt"

def count_chars(txt):
    result = 0
    for char in txt:
        result += 1     # same as result = result + 1
    return result

print(count_chars(text))

So far this is looking for the total len() of the text instead of by word.

I would like to get something like the function Counter Counter() this returns the word with the count of how many times it repeated throughout the text.

from collections import Counter
word_count=Counter(text)

I want to get the # of characters per word. Once we have such a count the plotting should be easier.

Thanks and anything helps!

Upvotes: 3

Views: 1380

Answers (2)

gboffi
gboffi

Reputation: 25023

It looks like the accepted answer doesn't solve the problem as it was posed by the querent

Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len() of characters

import matplotlib.pyplot as plt

# ch10 = ... the text of "Moby Dick"'s chapter 10, as found
# in https://www.gutenberg.org/files/2701/2701-h/2701-h.htm

# split chap10 into a list of words,
words = [w for w in ch10.split() if w]
# some words are joined by an em-dash
words = sum((w.split('—') for w in words), [])
# remove suffixes and one prefix
for suffix in (',','.',':',';','!','?','"'):
    words = [w.removesuffix(suffix) for w in words]
words = [w.removeprefix('"') for w in words]

# count the different lenghts using a dict
d = {}
for w in words:
    l = len(w)
    d[l] = d.get(l, 0) + 1

# retrieve the relevant info from the dict 
lenghts, counts = zip(*d.items())

# plot the relevant info
plt.bar(lenghts, counts)
plt.xticks(range(1, max(lenghts)+1))
plt.xlabel('Word lengths')
plt.ylabel('Word counts')
# what is the longest word?
plt.title(' '.join(w for w in words if len(w)==max(lenghts)))

# T H E   E N D

plt.show()

enter image description here

Upvotes: 1

user15349012
user15349012

Reputation:

Okay, first of all you need to open the sample.txt file.

with open('sample.txt', 'r') as text_file:
    text = text_file.read()

or

text = open('sample.txt', 'r').read()

Now we can count the words in the text and put it, for example, in a dict.

counter_dict = {}
for word in text.split(" "):
    counter_dict[word] = len(word)
print(counter_dict)

Upvotes: 5

Related Questions