user3447653
user3447653

Reputation: 4148

get first row in a group and assign values

I have a pandas dataframe in the below format

 id   name  value_1  value_2
 1    def   1        0
 2    abc   0        1

I would need to sort the above dataframe based on id, name, value_1 & value_2. Following that, for every group of [id,name,value_1,value_2], get the first row and set df['result'] = 1. For the other rows in that group, set df['result'] = 0.

I do the sorting and get the first row using the below code:

df = df.sort_values(["id","name","value_1","value_2"], ascending=True)
first_row_per_group = df.groupby(["id","name","value_1","value_2"]).agg('first')

After getting the first row, I set first_row_per_group ['result'] = 1. But I am not sure how to set the other rows (non-first) rows to 0.

Any suggestions would be appreciated.

Upvotes: 0

Views: 1501

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

duplicated would be faster than groupby:

df = df.sort_values(['id', 'name', 'value_1', 'value_2'])
df['result'] = (~df['id'].duplicated()).astype(int)

Upvotes: 2

Asish M.
Asish M.

Reputation: 2647

use df.groupby(...).cumcount() to get a counter of rows within the group which you can then manipulate.

In [51]: df
Out[51]:
     a  b  c
0  def  1  0
1  abc  0  1
2  def  1  0
3  abc  0  1

In [52]: df2 = df.sort_values(['a','b','c'])

In [53]: df2['result'] = df2.groupby(['a', 'b', 'c']).cumcount()

In [54]: df2['result'] = np.where(df2['result'] == 0, 1, 0)

In [55]: df2
Out[55]:
     a  b  c  result
1  abc  0  1       1
3  abc  0  1       0
0  def  1  0       1
2  def  1  0       0

Upvotes: 0

Related Questions