Reputation: 177
I have a data frame where you can have either one row or two rows by id.
I need to create two new columns where for each id group if we have one row only. We determine the values based on the condition on one column 'type'. If there are two rows, the values of the new columns depend on the conditions on 'type' and get values from the other row of the group.
An example will speak better.
We have at first this data frame:
index | id | type |
---|---|---|
0 | 1 | d |
1 | 2 | pu |
2 | 3 | pu |
3 | 3 | d |
4 | 4 | pu |
5 | 4 | d |
We expect that output:
if 1 row per 'id':
if type == 'd':
didx = -1
puidx = -10
elif type == 'pu':
didx = -10
puidx = -1
elif 2 rows per 'id':
if type == 'd':
didx = -1
puidx = index of the other row with same 'id'
elif type == 'pu':
didx = index of the other row with same 'id'
puidx = -1
index | id | type | didx | puidx |
---|---|---|---|---|
0 | 1 | d | -1 | -10 |
1 | 2 | pu | -10 | -1 |
2 | 3 | pu | 3 | -1 |
3 | 3 | d | -1 | 2 |
4 | 4 | pu | 5 | -1 |
5 | 4 | d | -1 | 4 |
I assume that we need to use groupby and apply. But not sure how to make it efficient. Thanks for your help.
Upvotes: 0
Views: 984
Reputation: 14064
One approach could be as follows:
df.groupby
to create a df
with first
and last
indices for all groups in column id
(here assigned to g
).didx
and puidx
, we use Series.map
to assign either -1
or -10
.Series.where
combined with Series.duplicated
. For, didx
we check ~df.id.duplicated(keep='last')
, and if True
we want the last index for the "group" (so: g['last']
mapped to df.id
), else we keep the value from didx
. For puidx
, we change last
to first
both times.g = df.groupby('id')['index'].agg(['first','last'])
df['didx'] = df.type.map({'d':-1,'pu':-10}).where(
~df.id.duplicated(keep='last'),
df.id.map(g['last']))
df['puidx'] = df.type.map({'d':-10,'pu':-1}).where(
~df.id.duplicated(keep='first'),
df.id.map(g['first']))
print(df)
id type didx puidx
index
0 1 d -1 -10
1 2 pu -10 -1
2 3 pu 3 -1
3 3 d -1 2
4 4 pu 5 -1
5 4 d -1 4
Upvotes: 1