Ruben
Ruben

Reputation: 23

Allocate the first row of a group in Pandas

I want to allocate the first row of a group.

The input:

df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'],
                   'col2': [1, 1, 2, 3],
                   'col3': ['value1', 'value2', 'value3', 'value4']})                   

I tried:

df.groupby(['col1', 'col2']).first()

But I only get the first row back.

I want this output:

col1 col2 col3    first_row
A    1    value1  True
A    1    value2  False
B    2    value3  True
B    3    Value4  True

Upvotes: 2

Views: 75

Answers (2)

ipj
ipj

Reputation: 3598

An alternative without grouping:

df['first_row'] = df.col1.shift().ne(df.col1) | df.col2.shift().ne(df.col2) 

result:

  col1  col2    col3  first_row
0    A     1  value1       True
1    A     1  value2      False
2    B     2  value3       True
3    B     3  value4       True

Upvotes: 0

Chris Adams
Chris Adams

Reputation: 18647

Use groupby.cumcount and eq. If the cumulative count is equal to 0, then it's the first row:

df['first_row'] = df.groupby(['col1', 'col2']).cumcount().eq(0)

[out]

  col1  col2    col3  first_row
0    A     1  value1       True
1    A     1  value2      False
2    B     2  value3       True
3    B     3  value4       True

Upvotes: 3

Related Questions