Reputation: 180
I have a large dataframe in the following format :
name ingredient colour similarity ids city country proba
pesto ba g 0.93 4 ve it 0.85
pesto sa p 0.93 3 to ca 0.92
pesto li y 0.99 6 lo en 0.81
pasta fl w 0.88 2 de in 0.8
pasta wa b 0.93 1 da te 0.84
egg eg w 1 5 ro ja 0.99
I want to rank all of the name
by their similarity
(higher similarity would have a higher rank and if 2 rows have the same similarity then the order at which they're appended doesn't matter) and then merge all of the duplicated rows together
the output would look like the following:
name ingredient colour similarity ids city country proba
pesto ['li', 'ba', 'sa'] ['y', 'g', 'p'] [0.99, 0.93, 0.93] [6, 4, 3] ['lo', 've', 'to'] ['en', 'it', 'ca'] [0.81, 0.85, 0.92]
pasta ['wa', 'fl'] ['b', 'w'] [0.93, 0.88] [1, 2] ['da', 'de'] ['te', 'in'] [0.84, 0.8]
egg ['eg'] ['w'] [1] [5] ['ro'] ['ja'] [0.99]
Upvotes: 4
Views: 118
Reputation: 863166
First convert name
to ordered Categorical for original ordering if order of name
is important, then sorting by both columns by DataFrame.sort_values
and last aggregate list
s:
df['name'] = pd.Categorical(df['name'], ordered=True, categories=df['name'].unique())
df1=df.sort_values(['name','similarity'], ascending=[True, False]).groupby('name').agg(list)
print (df1)
ingredient colour similarity ids city \
name
pesto [li, ba, sa] [y, g, p] [0.99, 0.93, 0.93] [6, 4, 3] [lo, ve, to]
pasta [wa, fl] [b, w] [0.93, 0.88] [1, 2] [da, de]
egg [eg] [w] [1.0] [5] [ro]
country proba
name
pesto [en, it, ca] [0.81, 0.85, 0.92]
pasta [te, in] [0.84, 0.8]
egg [ja] [0.99]
Another idea is sorting per groups:
df1 = (df.groupby('name', group_keys=False, sort=False)
.apply(lambda x: x.sort_values('similarity', ascending=False))
.groupby('name', sort=False).agg(list))
If order of name
is possible sorting, e.g. descending:
df2 = (df.sort_values(['name','similarity'], ascending=False)
.groupby('name', sort=False)
.agg(list))
print (df2)
ingredient colour similarity ids city \
name
pesto [li, ba, sa] [y, g, p] [0.99, 0.93, 0.93] [6, 4, 3] [lo, ve, to]
pasta [wa, fl] [b, w] [0.93, 0.88] [1, 2] [da, de]
egg [eg] [w] [1.0] [5] [ro]
country proba
name
pesto [en, it, ca] [0.81, 0.85, 0.92]
pasta [te, in] [0.84, 0.8]
egg [ja] [0.99]
Upvotes: 4