Reputation: 23
I want to allocate the first row of a group.
The input:
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'],
'col2': [1, 1, 2, 3],
'col3': ['value1', 'value2', 'value3', 'value4']})
I tried:
df.groupby(['col1', 'col2']).first()
But I only get the first row back.
I want this output:
col1 col2 col3 first_row A 1 value1 True A 1 value2 False B 2 value3 True B 3 Value4 True
Upvotes: 2
Views: 75
Reputation: 3598
An alternative without grouping:
df['first_row'] = df.col1.shift().ne(df.col1) | df.col2.shift().ne(df.col2)
result:
col1 col2 col3 first_row
0 A 1 value1 True
1 A 1 value2 False
2 B 2 value3 True
3 B 3 value4 True
Upvotes: 0
Reputation: 18647
Use groupby.cumcount
and eq
. If the cumulative count is equal to 0, then it's the first row:
df['first_row'] = df.groupby(['col1', 'col2']).cumcount().eq(0)
[out]
col1 col2 col3 first_row
0 A 1 value1 True
1 A 1 value2 False
2 B 2 value3 True
3 B 3 value4 True
Upvotes: 3