Reputation: 175
I'm trying to join two data frames based on multiple columns. However, one of the conditions is not straight forward, because one column in one data frame exists in column of lists in the other data frame. As following
df_a :
df_b:
I want to join both data frames based on node,channel,value. However, value is inside a list in column trail
I have tried to create a dictionary from df_a to have the node and channel as keys, but that won't be unique, I added the value as a key but then I couldn't look that value up. Any tip to solve this issue is much appreciated
Upvotes: 2
Views: 709
Reputation: 153460
Update per comment by @JonClements if not always the first element try:
(df_b.assign(value=df_b['trail'].str.split(','))
.explode('value')
.merge(df_a, on=['node', 'channel', 'value']))
Try, if value is always the first element in trail:
import pandas as pd
df_a = pd.DataFrame({'node':['a']*4,
'channel':[1,2,1,2],
'wavelength':[555, 666, 555, 666],
'value':['1-111-5', '1-222-6', '1-111-7', '1-222-8']})
df_b = pd.DataFrame({'node':['a']*4,
'channel':[1,2,1,2],
'trail':['1-111-5,1-a-5,2-r-3',
'1-222-6,2-4-5,1-d-3',
'1-111-7,1-a-2,1-f-t',
'1-222-8,1-e-4'],
'power':[10,12,13,12]})
df_out = (df_b.assign(value=df_b['trail'].str.split(',').str[0])
.merge(df_a, on=['node', 'channel', 'value']))
print(df_out)
Output:
node channel trail power value wavelength
0 a 1 1-111-5,1-a-5,2-r-3 10 1-111-5 555
1 a 2 1-222-6,2-4-5,1-d-3 12 1-222-6 666
2 a 1 1-111-7,1-a-2,1-f-t 13 1-111-7 555
3 a 2 1-222-8,1-e-4 12 1-222-8 666
Upvotes: 1