Reputation:
Hi im new to python and trying to understand joining
I have two dataframe -
df1
OutputValues
12-99
22-99
264-99
12-323,138-431
4-21
12-123
df2
OldId NewId
99 191
84 84
323 84
59 59
431 59
208 59
60 59
58 59
325 59
390 59
324 59
564 564
123 564
21 21
I want to join both of these based on the second half of the values in df1 i.e. the values after the hifen, for example 12--99 joins old id 99 in df2 and 4-21 to old id 21.
The final new output dataframe should join to the new values in df2 and look like-
df3
OutputValues OutputValues2
12-99 12-191
22-99 22-191
264-99 264-191
12-323,138-431 12-323,138-59
4-21 4-21
12-123,4-325 12-564,4-59
As you see, now the first part of the concatenation is joined with the new id in my desired final output dataframe df3 where there is 99 it is replaced with 191, 123 is replaced with 564 and 325 with 59,etc
How can i do this?
Upvotes: 0
Views: 46
Reputation: 26676
df1=df1['OutputValues'].str.split(',').explode().str.split('\-',expand=True).join(df1)#Separate explode to separate OutputValues and join them back to df1
df3=df2.astype(str).merge(g, left_on='OldId', right_on=1)#merge df2 and new df1
df3=df3.assign(OutputValues2=df3[0].str.cat(h.NewId, sep='-')).drop(columns=['OldId','NewId',0,1])#Create OutputValues2 and drop unrequired columns
df3.groupby('OutputValues')['OutputValues2'].agg(','.join).reset_index()
OutputValues OutputValues2
0 12-123 12-564
1 12-323,138-431 12-84,138-59
2 12-99 12-191
3 22-99 22-191
4 264-99 264-191
5 4-21 4-21
Upvotes: 0
Reputation: 150745
Let's extract both parts, map the last part then concatenate back:
s = df1.OutputValues.str.extractall('(\d+-)(\d+)');
df1['OutputValues2'] = (s[0]+s[1].map(df2.astype(str).set_index('OldId')['NewId'])
).groupby(level=0).agg(','.join)
Output:
OutputValues OutputValues2
0 12-99 12-191
1 22-99 22-191
2 264-99 264-191
3 12-323,138-431 12-84,138-59
4 4-21 4-21
5 12-123 12-564
Update: Looks like simple replace
would also work, but this might fail in some edge cases:
df1['OutputValues2'] = df1.OutputValues.replace(('-'+df2.astype(str))
.set_index('OldId')['NewId'],
regex=True)
Upvotes: 1