Beskimo
Beskimo

Reputation: 21

Python Pandas: Iteratively create new column with value from a different row within group of a dataframe

Using Python/Pandas I am trying to transform a dataframe by creating two new columns (A and B) conditional on values from different lines (from column ID3), but from within the same group (as determined by ID1).

For each ID1 group, I want to take the ID2 value where ID3 is equal to 31 and put this value in a new column called A conditional on ID3 being a 1 or a 2. Similarly, I want to take the ID2 value where ID3 is equal to 41 and put this value in a new column called B, again conditional on ID3 being a 1 or a 2.

Assuming I have a dataframe in the following format:

import pandas as pd

df = pd.DataFrame({'ID1': (1, 1, 1, 1, 2, 2, 2), 'ID2': (151, 152, 153, 154, 261, 262, 263), 'ID3': (1, 2, 31, 41, 1, 2, 41), 'ID4': (2, 2, 1, 2, 1, 1, 2)})
print(df)

     ID1  ID2  ID3  ID4
0    1   151    1    2
1    1   152    2    2
2    1   153   31    1
3    1   154   41    2
4    2   261    1    1
5    2   262    2    1
6    2   263   41    2

Post-transformation format should look like what is shown below. Where columns A and B are populated with values from ID2, conditional on values within ID3.

    ID1  ID2  ID3  ID4    A    B
0    1   151    1    2    153  154
1    1   152    2    2    153  154
2    1   153   31    1   
3    1   154   41    2
4    2   261    1    1
5    2   262    2    1         263
6    2   263   41    2         263

I have attempted using what is shown below, but transform will retain the same number of values as the original dataset. This poses a problem for the lines in which ID3 = 31 or 41. Also, it returns the ID2 value by default if there is no ID2 value of 31 within the group.

df['A'] = df.groupby('ID1')['ID2'].transform(lambda x: x.loc[df['ID3'] == 31])
df['B'] = df.groupby('ID1')['ID2'].transform(lambda x: x.loc[df['ID3'] == 41])

Result:

   ID1  ID2  ID3  ID4   A   B
0    1   151    1    2  153  154
1    1   152    2    2  153  154
2    1   153   31    1  153  154
3    1   154   41    2  153  154
4    2   261    1    1  261  263
5    2   262    2    1  262  263
6    2   263   41    2  263  263

Any suggestions? Thank you in advance!

Upvotes: 1

Views: 94

Answers (1)

DJK
DJK

Reputation: 9264

In no why do I think this is the best solution, but it its a solution.

You can replace .loc with .where, which will return NaN wherever the condition is not true. Then backfill NaN, and then again filter with .where on ID3 being 1 or 2

df['A'] = df.groupby('ID1')['ID2'].transform(lambda x: 
       x.where(df.ID3==31).fillna(method='bfill').where(df.ID3.isin([1,2])))

df['B'] = df.groupby('ID1')['ID2'].transform(lambda x: 
       x.where(df.ID3==41).fillna(method='bfill').where(df.ID3.isin([1,2])))

   ID1  ID2  ID3  ID4      A      B
0    1  151    1    2  153.0  154.0
1    1  152    2    2  153.0  154.0
2    1  153   31    1    NaN    NaN
3    1  154   41    2    NaN    NaN
4    2  261    1    1    NaN  263.0
5    2  262    2    1    NaN  263.0
6    2  263   41    2    NaN    NaN

Upvotes: 1

Related Questions