Reputation: 55
I was given a .txt file with a text. I have already cleaned the text (removed punctuation, uppercase, symbols), and now I have a string with the words.
I am now trying to get the count of characters len()
of each item on the string. Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len()
of characters
So far I have:
text = "sample.txt"
def count_chars(txt):
result = 0
for char in txt:
result += 1 # same as result = result + 1
return result
print(count_chars(text))
So far this is looking for the total len()
of the text instead of by word.
I would like to get something like the function Counter Counter()
this returns the word with the count of how many times it repeated throughout the text.
from collections import Counter
word_count=Counter(text)
I want to get the # of characters per word. Once we have such a count the plotting should be easier.
Thanks and anything helps!
Upvotes: 3
Views: 1380
Reputation: 25023
It looks like the accepted answer doesn't solve the problem as it was posed by the querent
Then make a plot where N of characters is on the X-axis and the Y-axis is the number of words that have such N len() of characters
import matplotlib.pyplot as plt
# ch10 = ... the text of "Moby Dick"'s chapter 10, as found
# in https://www.gutenberg.org/files/2701/2701-h/2701-h.htm
# split chap10 into a list of words,
words = [w for w in ch10.split() if w]
# some words are joined by an em-dash
words = sum((w.split('—') for w in words), [])
# remove suffixes and one prefix
for suffix in (',','.',':',';','!','?','"'):
words = [w.removesuffix(suffix) for w in words]
words = [w.removeprefix('"') for w in words]
# count the different lenghts using a dict
d = {}
for w in words:
l = len(w)
d[l] = d.get(l, 0) + 1
# retrieve the relevant info from the dict
lenghts, counts = zip(*d.items())
# plot the relevant info
plt.bar(lenghts, counts)
plt.xticks(range(1, max(lenghts)+1))
plt.xlabel('Word lengths')
plt.ylabel('Word counts')
# what is the longest word?
plt.title(' '.join(w for w in words if len(w)==max(lenghts)))
# T H E E N D
plt.show()
Upvotes: 1
Reputation:
Okay, first of all you need to open the sample.txt
file.
with open('sample.txt', 'r') as text_file:
text = text_file.read()
or
text = open('sample.txt', 'r').read()
Now we can count the words in the text and put it, for example, in a dict.
counter_dict = {}
for word in text.split(" "):
counter_dict[word] = len(word)
print(counter_dict)
Upvotes: 5