Reputation: 1

Best approach to this question using python

I am new to python and practicing some problems. Unable to optimize my solution for below problem.

problem statement: Encode words in sentence based on word frequency and return their rank and encoded value for the word.

example: input string --> 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'

expected output --> 3|2|1|3|4|2|5|1|1|2|1|4

Explanation:- because 'aaa' came 2 times in the original string, and 'ccc' 4 times and 'bb' 3 times, hence they get ranking based on frequency. In that manner 'ccc' rank is 1, 'bb' rank is 2, 'ccc' rank is 3. Hence the result as mentioned above.

Below is my python code, but unable to optimize it. Can someone please help.

def testing(s):
    ht = {}
    new_strs = strs.split()
    print(new_strs)
    for i in new_strs:
        if i in ht:
            ht[i] += 1
        else:
            ht[i] = 1
    print(ht)
    
    temp = list(map(list, sorted(ht.items(), key=lambda v: v[1], reverse=True)))
    print(temp)

    for k,v in enumerate(temp):
        temp[k].append(k+1)
    print(temp)
    
    final = []
    for j in new_strs:
        for t in temp:
            if t[0] == j:
                final.append(str(t[2]))
    return '|'.join(final)

strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'
result = testing(str)
print(result)

Below is the result i am getting from this code.

['aaa', 'bb', 'ccc', 'aaa', 'bbb', 'bb', 'cc', 'ccc', 'ccc', 'bb', 'ccc', 'bbb']

{'aaa': 2, 'bb': 3, 'ccc': 4, 'bbb': 2, 'cc': 1}

[['ccc', 4], ['bb', 3], ['aaa', 2], ['bbb', 2], ['cc', 1]]

[['ccc', 4, 1], ['bb', 3, 2], ['aaa', 2, 3], ['bbb', 2, 4], ['cc', 1, 5]]

3|2|1|3|4|2|5|1|1|2|1|4

Thank you in advance for your help.

Upvotes: 0

Answers (4)

pcauthorn

Reputation: 390

This is a little convoluted but would do it.

I think this is the best way to go i.e. separate the ranking logic into a class.

from collections import Counter


class Ranker:
    def __init__(self, items):
        self._item_counts = Counter(items)
        self._ranks = list(set(i[1] for i in Counter(items).most_common()))[::-1]

    def __getitem__(self, item):
        return self._ranks.index(self._item_counts[item]) + 1


if __name__ == '__main__':
    strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb aaa'.split()
    r = Ranker(strs)
    print('|'.join([str(r[s]) for s in strs]))
    # 2|2|1|2|3|2|4|1|1|2|1|3|2

Upvotes: 0

Brad Martsberger

Reputation: 1967

This question has changed since it was originally asked. So here is a new answer.

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    ranks = rank(sorted(list(dict(ht).items()), key = lambda t: t[1], reverse=True))
    ranks_dict = dict(ranks)
    return '|'.join(str(ranks_dict[s]) for s in new_strs

You just need the rank function, which takes a sorted list of tuples of (value, score) and returns a list of (value, rank)

def rank(tuples):
    current_score = tuples[0][1]
    current_rank = 1
    ties = 0
    ranks = []
    for tup in tuples:
        if tup[1] == current_score:
            ties += 1
        else:
            current_rank = current_rank + ties
            ties = 1
        ranks.append((tup[0], current_rank))
        current_score = tup[1]
    return ranks

Note that I am counting two words that appear the same number of times as having the same rank. In your example you had them as different rank, but didn't provide a way to determine which was which. I hope this is enough to get you on track.

Upvotes: 0

Brad Martsberger

Reputation: 1967

As pointed out in a comment, instead of

strs = '...'  # This is a global variable

def testing(s):
    ... # Body of testing function that never references the local `s` variable

You should have

def testing(strs):
    ... # Body of testing uses `strs` as before

There's no reason to sort ht.values(), so assigning to temp can be taken out altogether.

As you loop through new_strs all you want to be doing is making a list that contains the count of the element in new_strs. This is what you stored in the ht dictionary. So

for s in new_strs:
    final.append(ht[s])

Now final is a list that contains the count of how many times the strings appeared in the original string. And you can return the same as you currently do.

I recommend making those little changes and seeing that it works. Then, once the function works as you intended, there is a lot that can be cleaned up.

You can use a defaultdict instead of a regular dictionary. And you can use a list comprehension to construct the final list.

from collections import defaultdict

def testing(strs):
    ht = defaultdict(int)
    new_strs = strs.split()
    for s in new_strs:
         ht[s] += 1  # if `s` is not in ht, the default 0 is used.
    final = [strs(ht[s]) for s in new_strs]
    return '|'.join(final)

The string join method can take a generator, so it's not necessary to create the intermediate final variable. The last two lines can be written as one

return '|'.join(strs(ht[s]) for s in new_strs)

The collections module has a Counter collection That does exactly counting things in a list. You can write this function as:

from collections import Counter

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    return '|'.join(str(ht[s]) for s in new_strs)

Upvotes: 0

Prune

Reputation: 77857

Your code is fine through the counting. Starting with your for j loop, I'm not at all sure how you think this is supposed to work.

You need to iterate through the given words in the string -- one loop, not nested loops. For each word in the input, place its frequency into the result.

for word in new_strs:
    final.append(str(ht[word]))
print(final)

With that replacement, your output is:

['2', '3', '4', '2', '2', '3', '1', '4', '4', '3', '4', '2']
2|3|4|2|2|3|1|4|4|3|4|2

As Robert already pointed out, you have other errors in your code. In particular, you passed a type into your function. If you intended str to be a variable, don't do that. When you use a Python defined name (type string) as a variable, you damage your name space, and strange things happen.

Upvotes: 1

Best approach to this question using python

Answers (4)

Related Questions