zqiang0zh
zqiang0zh

Reputation: 91

How to select rows from the pandas dataframe with certain conditions

Suppose the dataframe is as follows:

id   class  count
 0     A      2
 0     B      2
 0     C      2
 0     D      1
 1     A      3
 1     B      3
 1     E      2
 2     D      4
 2     F      2 

For each id, I want to find the class whose count is the maximum. If more than one class have the same count, merge them into one line. For the above example, the result should be as follows:

id     class    count
 0     A,B,C      2
 1     A,B        3
 2     D          4

How to use a statement in pandas to realize this function?

Upvotes: 2

Views: 101

Answers (2)

piRSquared
piRSquared

Reputation: 294218

option 1

s = df.set_index(['id', 'class'])['count']
s1 = s[s.eq(s.groupby(level=0).max())].reset_index()
s1.groupby(
    ['id', 'count']
)['class'].apply(list).reset_index()[['id', 'class', 'count']]

   id      class  count
0   0  [A, B, C]    2.0
1   1     [A, B]    3.0
2   2        [D]    4.0

option 2

d1 = df.set_index(['id', 'class'])['count'].unstack()

v = d1.values
m = np.nanmax(v, 1)
t = v == m[:, None]
pd.DataFrame({
        'id': d1.index,
        'class': [list(s) for s in t.dot(d1.columns)],
        'count': m
    })[['id', 'class', 'count']]

   id      class  count
0   0  [A, B, C]    2.0
1   1     [A, B]    3.0
2   2        [D]    4.0

Upvotes: 3

jezrael
jezrael

Reputation: 862406

Solution with transform and aggregate:

df = df[g['count'].transform('max').eq(df['count'])]
print (df)
   id class  count
0   0     A      2
1   0     B      2
2   0     C      2
4   1     A      3
5   1     B      3
7   2     D      4

df = df.groupby('id').agg({'class':','.join, 'count':'first'}).reset_index()
print (df)
   id  class  count
0   0  A,B,C      2
1   1    A,B      3
2   2      D      4

Another solution with custom function:

def f(x):
    x = x[x['count'] == x['count'].max()]
    return (pd.Series([','.join(x['class'].values.tolist()), x['count'].iat[0]], 
                      index=['class','count']))

df = df.groupby('id').apply(f).reset_index()
print (df)
   id  class  count
0   0  A,B,C      2
1   1    A,B      3
2   2      D      4

Upvotes: 3

Related Questions