Reputation: 137
Trying to group Dataframe and descending sort entries for a column within each group and pick the first entry.
For example:
Input Dataframe
Col1 | Col2 |
---|---|
First | Y |
First | N |
First | |
First | Y |
Second | N |
Second | |
Second | |
Second | |
Second | N |
The idea is to group by col1 and descending sort entries in col2 and pick the first entry. When col1 value is First, col2 values are (Y, N, NaN, Y) ---sort descending--> (Y, Y, N, NaN) --> create new col3 with first value, i.e., Y
Expected Output:
Col1 | Col2 | Col3 |
---|---|---|
First | Y | Y |
First | N | Y |
First | Y | |
First | Y | Y |
Second | N | N |
Second | N | |
Second | N | |
Second | N | |
Second | N | N |
Upvotes: 1
Views: 567
Reputation: 862661
You can convert column to ordered Categoricals if need some custom priority order and then use GroupBy.transform
with min
:
df['Col2'] = pd.Categorical(df['Col2'].fillna(''), ordered=True, categories=['Y','N', ''])
df['Col2'] = df.groupby('Col1')['Col2'].transform('min')
print (df)
Col1 Col2
0 First Y
1 First Y
2 First Y
3 First Y
4 Second N
5 Second N
6 Second N
7 Second N
8 Second N
Another idea is sorting by both columns and then use GroupBy.first
in GroupBy.transform
:
df = df.sort_values(['Col1','Col2'], ascending=[True, False])
df['Col2'] = df.groupby('Col1')['Col2'].transform('first')
print (df)
Col1 Col2
0 First Y
3 First Y
1 First Y
2 First Y
4 Second N
8 Second N
5 Second N
6 Second N
7 Second N
Upvotes: 1