Reputation: 21
Using Python/Pandas I am trying to transform a dataframe by creating two new columns (A
and B
) conditional on values from different lines (from column ID3
), but from within the same group (as determined by ID1
).
For each ID1
group, I want to take the ID2
value where ID3
is equal to 31
and put this value in a new column called A
conditional on ID3
being a 1
or a 2
. Similarly, I want to take the ID2
value where ID3
is equal to 41
and put this value in a new column called B
, again conditional on ID3
being a 1
or a 2
.
Assuming I have a dataframe in the following format:
import pandas as pd
df = pd.DataFrame({'ID1': (1, 1, 1, 1, 2, 2, 2), 'ID2': (151, 152, 153, 154, 261, 262, 263), 'ID3': (1, 2, 31, 41, 1, 2, 41), 'ID4': (2, 2, 1, 2, 1, 1, 2)})
print(df)
ID1 ID2 ID3 ID4
0 1 151 1 2
1 1 152 2 2
2 1 153 31 1
3 1 154 41 2
4 2 261 1 1
5 2 262 2 1
6 2 263 41 2
Post-transformation format should look like what is shown below. Where columns A
and B
are populated with values from ID2
, conditional on values within ID3
.
ID1 ID2 ID3 ID4 A B
0 1 151 1 2 153 154
1 1 152 2 2 153 154
2 1 153 31 1
3 1 154 41 2
4 2 261 1 1
5 2 262 2 1 263
6 2 263 41 2 263
I have attempted using what is shown below, but transform will retain the same number of values as the original dataset. This poses a problem for the lines in which ID3
= 31
or 41
. Also, it returns the ID2
value by default if there is no ID2
value of 31
within the group.
df['A'] = df.groupby('ID1')['ID2'].transform(lambda x: x.loc[df['ID3'] == 31])
df['B'] = df.groupby('ID1')['ID2'].transform(lambda x: x.loc[df['ID3'] == 41])
Result:
ID1 ID2 ID3 ID4 A B
0 1 151 1 2 153 154
1 1 152 2 2 153 154
2 1 153 31 1 153 154
3 1 154 41 2 153 154
4 2 261 1 1 261 263
5 2 262 2 1 262 263
6 2 263 41 2 263 263
Any suggestions? Thank you in advance!
Upvotes: 1
Views: 94
Reputation: 9264
In no why do I think this is the best solution, but it its a solution.
You can replace .loc
with .where
, which will return NaN
wherever the condition is not true. Then backfill NaN
, and then again filter with .where
on ID3
being 1 or 2
df['A'] = df.groupby('ID1')['ID2'].transform(lambda x:
x.where(df.ID3==31).fillna(method='bfill').where(df.ID3.isin([1,2])))
df['B'] = df.groupby('ID1')['ID2'].transform(lambda x:
x.where(df.ID3==41).fillna(method='bfill').where(df.ID3.isin([1,2])))
ID1 ID2 ID3 ID4 A B
0 1 151 1 2 153.0 154.0
1 1 152 2 2 153.0 154.0
2 1 153 31 1 NaN NaN
3 1 154 41 2 NaN NaN
4 2 261 1 1 NaN 263.0
5 2 262 2 1 NaN 263.0
6 2 263 41 2 NaN NaN
Upvotes: 1