Reputation: 31
I have to do a task where I open a text file, then count the number of times each word is capitalised. Then I need to print the top 3 occurrences. This piece of code works until it gets a text file with words that double up in a line.
txt file 1:
Jellicle Cats are black and white,
Jellicle Cats are rather small;
Jellicle Cats are merry and bright,
And pleasant to hear when they caterwaul.
Jellicle Cats have cheerful faces,
Jellicle Cats have bright black eyes;
They like to practise their airs and graces
And wait for the Jellicle Moon to rise.
Results:
6 Jellicle
5 Cats
2 And
txt file 2:
Baa Baa black sheep have you any wool?
Yes sir Yes sir, wool for everyone.
One for the master,
One for the dame.
One for the little boy who lives down the lane.
Results:
1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes
1 Baa
1 One
1 Yes
Here is my code:
wc = {}
t3 = {}
p = 0
xx=0
a = open('novel.txt').readlines()
for i in a:
b = i.split()
for l in b:
if l[0].isupper():
if l not in wc:
wc[l] = 1
else:
wc[l] += 1
while p < 3:
p += 1
max_val=max(wc.values())
for words in wc:
if wc[words] == max_val:
t3[words] = wc[words]
wc[words] = 1
else:
null = 1
while xx < 3:
xx+=1
maxval = max(t3.values())
for word in sorted(t3):
if t3[word] == maxval:
print(t3[word],word)
t3[word] = 1
else:
null+=1
Please help me solve this. Thank You!
Thank you for all the suggestions. After manually debugging the code, as well as using your responses, I was able to figure out that while xx < 3:
was unnecessary, as well as wc[words] = 1
ended up making the program double count the words if the third most occurring word occurred once. By replacing it with wc[words] = 0
I was able to avoid having a counting loop.
Thank you!
Upvotes: 2
Views: 979
Reputation: 1
import operator
fname = 'novel.txt'
fptr = open(fname)
x = fptr.read()
words = x.split()
data = {}
p = 0
for word in words:
if word[0].isupper():
if word in data:
data[word] = data[word] + 1
else:
data[word] = 1
valores_ord = dict(sorted(data.items(), key=operator.itemgetter(1), reverse=True)[:3])
for word in valores_ord:
print(valores_ord[word],word)
Upvotes: 0
Reputation: 402533
This is super simple. But you'll need a few tools.
re.sub
, to get rid of punctuation
filter
, to filter out words by title case using str.istitle
collections.Counter
, to count words (do from collections import Counter
first).
Assuming text
holds your para (first one), this works:
In [296]: Counter(filter(str.istitle, re.sub('[^\w\s]', '', text).split())).most_common(3)
Out[296]: [('Jellicle', 6), ('Cats', 5), ('And', 2)]
Counter.most_common(x)
returns the x
most common words.
Coincidentally, this is the output for your second para:
[('One', 3), ('Baa', 2), ('Yes', 2)]
Upvotes: 4