Rush
Rush

Reputation: 137

Sort entries in the group and return the first one in alphabetical order

Trying to group Dataframe and descending sort entries for a column within each group and pick the first entry.

For example:

Input Dataframe

Col1 Col2
First Y
First N
First
First Y
Second N
Second
Second
Second
Second N

The idea is to group by col1 and descending sort entries in col2 and pick the first entry. When col1 value is First, col2 values are (Y, N, NaN, Y) ---sort descending--> (Y, Y, N, NaN) --> create new col3 with first value, i.e., Y

Expected Output:

Col1 Col2 Col3
First Y Y
First N Y
First Y
First Y Y
Second N N
Second N
Second N
Second N
Second N N

Upvotes: 1

Views: 567

Answers (1)

jezrael
jezrael

Reputation: 862661

You can convert column to ordered Categoricals if need some custom priority order and then use GroupBy.transform with min:

df['Col2'] = pd.Categorical(df['Col2'].fillna(''), ordered=True, categories=['Y','N', ''])

df['Col2'] = df.groupby('Col1')['Col2'].transform('min')
print (df)
     Col1 Col2
0   First    Y
1   First    Y
2   First    Y
3   First    Y
4  Second    N
5  Second    N
6  Second    N
7  Second    N
8  Second    N

Another idea is sorting by both columns and then use GroupBy.first in GroupBy.transform:

df = df.sort_values(['Col1','Col2'], ascending=[True, False])

df['Col2'] = df.groupby('Col1')['Col2'].transform('first')
print (df)
     Col1 Col2
0   First    Y
3   First    Y
1   First    Y
2   First    Y
4  Second    N
8  Second    N
5  Second    N
6  Second    N
7  Second    N

Upvotes: 1

Related Questions