Reputation: 820
I have a dataframe like so:
id_a id_b col_a
NaN NaN NaN
0 133 [23]
7 191 [10,7]
10 303 [1]
23 200 [0,7,10]
I need to create a new column like so:
id_a id_b col_a col_b
NaN NaN NaN NaN
0 133 [23] [200]
7 191 [10,7] [303,191]
10 303 [1] [101]
23 200 [0,7,10] [133,191,303]
The logic is that I am returning the ids from id_b
based on a corresponding value in id_a
.
Example: for the first list [1,0,7];
I return 101, from id_b
since the corresponding value with 1 in id_b
is 101. Then 133 since 0 matches with 133 and finally 191 since 7 is the corresponding value here.
I have tried using .loc
to capture just those rows but haven't got far. Any help is greatly appreciated
Upvotes: 1
Views: 61
Reputation: 75080
With df.stack()
:
d = dict(zip(df['id_a'],df['id_b']))
df['col_b'] = (pd.DataFrame(df['col_a'].tolist()).replace(d)
.stack().groupby(level=0).agg(list))
Or:
apply
:
df['col_b'] = df['col_a'].apply(lambda x: [d.get(i) for i in x])
print(df)
id_a id_b col_a col_b
0 1 101 [1, 0, 7] [101, 133, 191]
1 0 133 [23] [200]
2 7 191 [10, 7] [303, 191]
3 10 303 [1] [101]
4 23 200 [0, 7, 10] [133, 191, 303]
Upvotes: 3
Reputation: 323226
IIUC explode
then map
df.col_a.explode().map(dict(zip(df.id_a,df.id_b))).groupby(level=0).agg(list)
0 [101, 133, 191]
1 [200]
2 [303, 191]
3 [101]
4 [133, 191, 303]
Name: col_a, dtype: object
Upvotes: 3