How to join concatenated values to new values in python

Question

Hi im new to python and trying to understand joining

I have two dataframe -

df1
OutputValues
12-99
22-99
264-99
12-323,138-431
4-21
12-123

df2
OldId   NewId
99      191
84       84
323      84
59       59
431      59
208      59
60       59
58       59
325      59
390      59
324      59
564     564
123     564
21      21

I want to join both of these based on the second half of the values in df1 i.e. the values after the hifen, for example 12--99 joins old id 99 in df2 and 4-21 to old id 21.

The final new output dataframe should join to the new values in df2 and look like-

df3
OutputValues     OutputValues2
12-99               12-191
22-99               22-191
264-99              264-191
12-323,138-431      12-323,138-59
4-21                4-21
12-123,4-325        12-564,4-59

As you see, now the first part of the concatenation is joined with the new id in my desired final output dataframe df3 where there is 99 it is replaced with 191, 123 is replaced with 564 and 325 with 59,etc

How can i do this?

Quang Hoang · Accepted Answer

Let's extract both parts, map the last part then concatenate back:

s = df1.OutputValues.str.extractall('(\d+-)(\d+)');
df1['OutputValues2'] = (s[0]+s[1].map(df2.astype(str).set_index('OldId')['NewId'])
).groupby(level=0).agg(','.join)

Output:

     OutputValues OutputValues2
0           12-99        12-191
1           22-99        22-191
2          264-99       264-191
3  12-323,138-431  12-84,138-59
4            4-21          4-21
5          12-123        12-564

Update: Looks like simple replace would also work, but this might fail in some edge cases:

df1['OutputValues2'] = df1.OutputValues.replace(('-'+df2.astype(str))
                                                     .set_index('OldId')['NewId'], 
                                                 regex=True)

How to join concatenated values to new values in python

Answers (2)

Related Questions