Reputation: 1
I am new to python and practicing some problems. Unable to optimize my solution for below problem.
problem statement: Encode words in sentence based on word frequency and return their rank and encoded value for the word.
example: input string --> 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'
expected output --> 3|2|1|3|4|2|5|1|1|2|1|4
Explanation:- because 'aaa' came 2 times in the original string, and 'ccc' 4 times and 'bb' 3 times, hence they get ranking based on frequency. In that manner 'ccc' rank is 1, 'bb' rank is 2, 'ccc' rank is 3. Hence the result as mentioned above.
Below is my python code, but unable to optimize it. Can someone please help.
def testing(s):
ht = {}
new_strs = strs.split()
print(new_strs)
for i in new_strs:
if i in ht:
ht[i] += 1
else:
ht[i] = 1
print(ht)
temp = list(map(list, sorted(ht.items(), key=lambda v: v[1], reverse=True)))
print(temp)
for k,v in enumerate(temp):
temp[k].append(k+1)
print(temp)
final = []
for j in new_strs:
for t in temp:
if t[0] == j:
final.append(str(t[2]))
return '|'.join(final)
strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'
result = testing(str)
print(result)
Below is the result i am getting from this code.
['aaa', 'bb', 'ccc', 'aaa', 'bbb', 'bb', 'cc', 'ccc', 'ccc', 'bb', 'ccc', 'bbb']
{'aaa': 2, 'bb': 3, 'ccc': 4, 'bbb': 2, 'cc': 1}
[['ccc', 4], ['bb', 3], ['aaa', 2], ['bbb', 2], ['cc', 1]]
[['ccc', 4, 1], ['bb', 3, 2], ['aaa', 2, 3], ['bbb', 2, 4], ['cc', 1, 5]]
3|2|1|3|4|2|5|1|1|2|1|4
Thank you in advance for your help.
Upvotes: 0
Views: 685
Reputation: 390
This is a little convoluted but would do it.
I think this is the best way to go i.e. separate the ranking logic into a class.
from collections import Counter
class Ranker:
def __init__(self, items):
self._item_counts = Counter(items)
self._ranks = list(set(i[1] for i in Counter(items).most_common()))[::-1]
def __getitem__(self, item):
return self._ranks.index(self._item_counts[item]) + 1
if __name__ == '__main__':
strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb aaa'.split()
r = Ranker(strs)
print('|'.join([str(r[s]) for s in strs]))
# 2|2|1|2|3|2|4|1|1|2|1|3|2
Upvotes: 0
Reputation: 1967
This question has changed since it was originally asked. So here is a new answer.
def testing(strs):
new_strs = strs.split()
ht = Counter(new_strs)
ranks = rank(sorted(list(dict(ht).items()), key = lambda t: t[1], reverse=True))
ranks_dict = dict(ranks)
return '|'.join(str(ranks_dict[s]) for s in new_strs
You just need the rank
function, which takes a sorted list of tuples of (value, score) and returns a list of (value, rank)
def rank(tuples):
current_score = tuples[0][1]
current_rank = 1
ties = 0
ranks = []
for tup in tuples:
if tup[1] == current_score:
ties += 1
else:
current_rank = current_rank + ties
ties = 1
ranks.append((tup[0], current_rank))
current_score = tup[1]
return ranks
Note that I am counting two words that appear the same number of times as having the same rank. In your example you had them as different rank, but didn't provide a way to determine which was which. I hope this is enough to get you on track.
Upvotes: 0
Reputation: 1967
As pointed out in a comment, instead of
strs = '...' # This is a global variable
def testing(s):
... # Body of testing function that never references the local `s` variable
You should have
def testing(strs):
... # Body of testing uses `strs` as before
There's no reason to sort ht.values()
, so assigning to temp
can be taken out altogether.
As you loop through new_strs
all you want to be doing is making a list that contains the count of the element in new_strs. This is what you stored in the ht
dictionary. So
for s in new_strs:
final.append(ht[s])
Now final is a list that contains the count of how many times the strings appeared in the original string. And you can return the same as you currently do.
I recommend making those little changes and seeing that it works. Then, once the function works as you intended, there is a lot that can be cleaned up.
You can use a defaultdict instead of a regular dictionary. And you can use a list comprehension to construct the final
list.
from collections import defaultdict
def testing(strs):
ht = defaultdict(int)
new_strs = strs.split()
for s in new_strs:
ht[s] += 1 # if `s` is not in ht, the default 0 is used.
final = [strs(ht[s]) for s in new_strs]
return '|'.join(final)
The string join method can take a generator, so it's not necessary to create the intermediate final
variable. The last two lines can be written as one
return '|'.join(strs(ht[s]) for s in new_strs)
The collections module has a Counter collection That does exactly counting things in a list. You can write this function as:
from collections import Counter
def testing(strs):
new_strs = strs.split()
ht = Counter(new_strs)
return '|'.join(str(ht[s]) for s in new_strs)
Upvotes: 0
Reputation: 77857
Your code is fine through the counting. Starting with your for j
loop, I'm not at all sure how you think this is supposed to work.
You need to iterate through the given words in the string -- one loop, not nested loops. For each word in the input, place its frequency into the result.
for word in new_strs:
final.append(str(ht[word]))
print(final)
With that replacement, your output is:
['2', '3', '4', '2', '2', '3', '1', '4', '4', '3', '4', '2']
2|3|4|2|2|3|1|4|4|3|4|2
As Robert
already pointed out, you have other errors in your code. In particular, you passed a type into your function. If you intended str
to be a variable, don't do that. When you use a Python defined name (type string) as a variable, you damage your name space, and strange things happen.
Upvotes: 1