Reputation: 91
Suppose the dataframe is as follows:
id class count
0 A 2
0 B 2
0 C 2
0 D 1
1 A 3
1 B 3
1 E 2
2 D 4
2 F 2
For each id, I want to find the class whose count is the maximum. If more than one class have the same count, merge them into one line. For the above example, the result should be as follows:
id class count
0 A,B,C 2
1 A,B 3
2 D 4
How to use a statement in pandas to realize this function?
Upvotes: 2
Views: 101
Reputation: 294218
option 1
s = df.set_index(['id', 'class'])['count']
s1 = s[s.eq(s.groupby(level=0).max())].reset_index()
s1.groupby(
['id', 'count']
)['class'].apply(list).reset_index()[['id', 'class', 'count']]
id class count
0 0 [A, B, C] 2.0
1 1 [A, B] 3.0
2 2 [D] 4.0
option 2
d1 = df.set_index(['id', 'class'])['count'].unstack()
v = d1.values
m = np.nanmax(v, 1)
t = v == m[:, None]
pd.DataFrame({
'id': d1.index,
'class': [list(s) for s in t.dot(d1.columns)],
'count': m
})[['id', 'class', 'count']]
id class count
0 0 [A, B, C] 2.0
1 1 [A, B] 3.0
2 2 [D] 4.0
Upvotes: 3
Reputation: 862406
Solution with transform
and aggregate
:
df = df[g['count'].transform('max').eq(df['count'])]
print (df)
id class count
0 0 A 2
1 0 B 2
2 0 C 2
4 1 A 3
5 1 B 3
7 2 D 4
df = df.groupby('id').agg({'class':','.join, 'count':'first'}).reset_index()
print (df)
id class count
0 0 A,B,C 2
1 1 A,B 3
2 2 D 4
Another solution with custom function:
def f(x):
x = x[x['count'] == x['count'].max()]
return (pd.Series([','.join(x['class'].values.tolist()), x['count'].iat[0]],
index=['class','count']))
df = df.groupby('id').apply(f).reset_index()
print (df)
id class count
0 0 A,B,C 2
1 1 A,B 3
2 2 D 4
Upvotes: 3