user2590177
user2590177

Reputation: 177

Create new column in a dataframe based on group by and other conditions

I have a data frame where you can have either one row or two rows by id.

I need to create two new columns where for each id group if we have one row only. We determine the values based on the condition on one column 'type'. If there are two rows, the values of the new columns depend on the conditions on 'type' and get values from the other row of the group.

An example will speak better.

We have at first this data frame:

index id type
0 1 d
1 2 pu
2 3 pu
3 3 d
4 4 pu
5 4 d

We expect that output:

if 1 row per 'id':
   if type == 'd':
      didx = -1
      puidx = -10
   elif type == 'pu':
      didx = -10
      puidx = -1
elif 2 rows per 'id':
   if type == 'd':
      didx = -1
      puidx = index of the other row with same 'id'
   elif type == 'pu':
      didx = index of the other row with same 'id'
      puidx = -1
index id type didx puidx
0 1 d -1 -10
1 2 pu -10 -1
2 3 pu 3 -1
3 3 d -1 2
4 4 pu 5 -1
5 4 d -1 4

I assume that we need to use groupby and apply. But not sure how to make it efficient. Thanks for your help.

Upvotes: 0

Views: 984

Answers (1)

ouroboros1
ouroboros1

Reputation: 14064

One approach could be as follows:

  • Use df.groupby to create a df with first and last indices for all groups in column id (here assigned to g).
  • Now, for both didx and puidx, we use Series.map to assign either -1 or -10.
  • Next, we chain Series.where combined with Series.duplicated. For, didx we check ~df.id.duplicated(keep='last'), and if True we want the last index for the "group" (so: g['last'] mapped to df.id), else we keep the value from didx. For puidx, we change last to first both times.
g = df.groupby('id')['index'].agg(['first','last'])

df['didx'] = df.type.map({'d':-1,'pu':-10}).where(
    ~df.id.duplicated(keep='last'),
    df.id.map(g['last']))

df['puidx'] = df.type.map({'d':-10,'pu':-1}).where(
    ~df.id.duplicated(keep='first'),
    df.id.map(g['first']))

print(df)

       id type  didx  puidx
index                      
0       1    d    -1    -10
1       2   pu   -10     -1
2       3   pu     3     -1
3       3    d    -1      2
4       4   pu     5     -1
5       4    d    -1      4

Upvotes: 1

Related Questions