Sort entries in the group and return the first one in alphabetical order

Question

Trying to group Dataframe and descending sort entries for a column within each group and pick the first entry.

For example:

Input Dataframe

Col1	Col2
First	Y
First	N
First
First	Y
Second	N
Second
Second
Second
Second	N

The idea is to group by col1 and descending sort entries in col2 and pick the first entry. When col1 value is First, col2 values are (Y, N, NaN, Y) ---sort descending--> (Y, Y, N, NaN) --> create new col3 with first value, i.e., Y

Expected Output:

Col1	Col2	Col3
First	Y	Y
First	N	Y
First		Y
First	Y	Y
Second	N	N
Second		N
Second		N
Second		N
Second	N	N

jezrael · Accepted Answer

You can convert column to ordered Categoricals if need some custom priority order and then use GroupBy.transform with min:

df['Col2'] = pd.Categorical(df['Col2'].fillna(''), ordered=True, categories=['Y','N', ''])

df['Col2'] = df.groupby('Col1')['Col2'].transform('min')
print (df)
     Col1 Col2
0   First    Y
1   First    Y
2   First    Y
3   First    Y
4  Second    N
5  Second    N
6  Second    N
7  Second    N
8  Second    N

Another idea is sorting by both columns and then use GroupBy.first in GroupBy.transform:

df = df.sort_values(['Col1','Col2'], ascending=[True, False])

df['Col2'] = df.groupby('Col1')['Col2'].transform('first')
print (df)
     Col1 Col2
0   First    Y
3   First    Y
1   First    Y
2   First    Y
4  Second    N
8  Second    N
5  Second    N
6  Second    N
7  Second    N

Sort entries in the group and return the first one in alphabetical order

Answers (1)

Related Questions