Hatt
Hatt

Reputation: 709

Loop Through a Grouped Column in Pandas

I'm trying to compare every combination of phrases within a grouping to match and score them. I'm getting hung up on the looping through the groups:

import pandas as pd
from fuzzywuzzy import fuzz as fz
import itertools

data = [[1,'ab'],[1,'bc'],[1,'de'],[2,'gh'],[2,'hi'],[2,'jk'],[3,'kl'],[3,'lm'],[3,'yz']]
df = pd.DataFrame(data,columns=['Ids','DESCR'])

def iterated(df):
    for a, b in itertools.product(df['DESCR'],df['DESCR']):
        try:
            print(a, b, fz.partial_ratio(a, b), fz.token_set_ratio(a,b))
        except:
            pass
    return result

df.groupby('Ids').apply(iterated(df))

The above is comparing each DESCR against everything in the whole list, rather than restricting it to each grouping. I'm getting:

ab ab 100 100
ab bc 50 50
ab de 0 0
ab gh 0 0
ab hi 0 0
ab jk 0 0
ab kl 0 0
ab lm 0 0
ab yz 0 0
bc ab 67 50
bc bc 100 100
bc de 0 0
bc gh 0 0
bc hi 0 0
bc jk 0 0
bc kl 0 0
bc lm 0 0
bc yz 0 0
...

But it should be:

ab bc 50 50
ab de 0 0
bc de 0 0
gh hi 50 50
gh jk 0 0
hi jk 50 50
...

Thank you.

Upvotes: 0

Views: 207

Answers (1)

AirSquid
AirSquid

Reputation: 11883

I think the problem is you aren't handling the groups correctly. You are grouping and then applying your function based on the DESCR results in the entire df with your command .apply(iterated(df)). Also, I think you want to use combinations instead of product.

You may need to break it apart and handle the groups individually. Consider:

import pandas as pd
import itertools
data = [[1,'ab'],[1,'bc'],[1,'de'],[2,'gh'],[2,'hi'],[2,'jk'],[3,'kl'],[3,'lm'],[3,'yz']]
df = pd.DataFrame(data,columns=['Ids','DESCR'])

def show_combos(df):  #replace with your function...
    combos = itertools.combinations(df.DESCR, 2)
    for c in combos:
        print(c)

groups = df.groupby('Ids')

#iterate through the groups, which are mini-data frames
for name, group in groups:
    print('group name: {}'.format(name))
    show_combos(group)
    print()

Which yields the groups you wanted:

group name: 1
('ab', 'bc')
('ab', 'de')
('bc', 'de')

group name: 2
('gh', 'hi')
('gh', 'jk')
('hi', 'jk')

group name: 3
('kl', 'lm')
('kl', 'yz')
('lm', 'yz')

Upvotes: 1

Related Questions