Reputation: 83
I am a undergraduate student who is new here and loves programming. I meet a problem in practice and I want to ask for help here.
Given a string an integer n, return the nth most common word and it's count, ignore capitalization.
For the word, make sure all the letters are lowercase when you return it!
Hint: The split() function and dictionaries may be useful.
Example:
Input: "apple apple apple blue BlUe call", 2
Output: The list ["blue", 2]
My code is in the following:
from collections import Counter
def nth_most(str_in, n):
split_it = str_in.split(" ")
array = []
for word, count in Counter(split_it).most_common(n):
list = [word, count]
array.append(count)
array.sort()
if len(array) - n <= len(array) - 1:
c = array[len(array) - n]
return [word, c]
The test result is like in the following:
Traceback (most recent call last):
File "/grade/run/test.py", line 10, in test_one
self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
File "/grade/run/bin/nth_most.py", line 10, in nth_most
c = array[len(array) - n]
IndexError: list index out of range
As well as
Traceback (most recent call last):
File "/grade/run/test.py", line 20, in test_negative
self.assertEqual(nth_most('awe Awe AWE BLUE BLUE call', 1), ['awe', 3])
AssertionError: Lists differ: ['BLUE', 2] != ['awe', 3]
First differing element 0:
'BLUE'
'awe'
I don't know what's wrong with my code.
Thank you very much for your help!
Upvotes: 2
Views: 1783
Reputation: 307
Even you can get without Collection module: paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been'
def nth_common(n,p):
words=re.split('\W+',p.lower())
word_count={}
counter=0
for i in words:
if i in word_count:
word_count[i]+=1
else:
word_count[i]=1
sorted_count = sorted(word_count.items(), key=lambda x: x[1],reverse=True)
return sorted_count[n-1]
nth_common(3,paragraph)
output will be ('catholic', 6)
sorted(based on count) word count output: [('was', 6), ('a', 6), ('catholic', 6), ('because', 3), ('her', 3), ('mother', 3), ('nory', 2), ('and', 2), ('father', 2), ('s', 1), ('his', 1), ('or', 1), ('had', 1), ('been', 1)]
Upvotes: 0
Reputation: 17322
Counter return most commune elements in order so you can do like:
list(Counter(str_in.lower().split()).most_common(n)[-1]) # n is nth most common word
Upvotes: 3
Reputation: 16772
def nth_common(lowered_words, check):
m = []
for i in lowered_words:
m.append((i, lowered_words.count(i)))
for i in set(m):
# print(i)
if i[1] == check: # check if the first index value (occurrance) of tuple == check
print(i, "found")
del m[:] # deleting list for using it again
words = ['apple', 'apple', 'apple', 'blue', 'BLue', 'call', 'cAlL']
lowered_words = [x.lower() for x in words] # ignoring the uppercase
check = 2 # the check
nth_common(lowered_words, check)
OUTPUT:
('blue', 2) found
('call', 2) found
Upvotes: 2
Reputation: 140188
Since you're using Counter
, just use it wisely:
import collections
def nth_most(str_in, n):
c = sorted(collections.Counter(w.lower() for w in str_in.split()).items(),key = lambda x:x[1])
return(list(c[-n])) # convert to list as it seems to be the expected output
print(nth_most("apple apple apple blue BlUe call",2))
Build the word frequency dictionary, sort items according to values (2nd element of the tuple) and pick the nth last element.
This prints ['blue', 2]
.
What if there are 2 words with same frequency (tie) in first or second position ? This solution doesn't work. Instead, sort the number of occurrences, extract the nth most common occurrence, and run through the counter dict again to extract matches.
def nth_most(str_in, n):
c = collections.Counter(w.lower() for w in str_in.split())
nth_occs = sorted(c.values())[-n]
return [[k,v] for k,v in c.items() if v==nth_occs]
print(nth_most("apple apple apple call blue BlUe call woot",2))
this time it prints:
[['call', 2], ['blue', 2]]
Upvotes: 3
Reputation: 111
Traceback (most recent call last):
File "/grade/run/test.py", line 10, in test_one
self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
File "/grade/run/bin/nth_most.py", line 10, in nth_most
c = array[len(array) - n]
IndexError: list index out of range
to solve this list out of index error, just put
maxN = 1000 #change according to your max length
array = [ 0 for _ in range( maxN ) ]
Upvotes: 1