user12233274
user12233274

Reputation:

How to join concatenated values to new values in python

Hi im new to python and trying to understand joining

I have two dataframe -

df1
OutputValues
12-99
22-99
264-99
12-323,138-431
4-21
12-123

df2
OldId   NewId
99      191
84       84
323      84
59       59
431      59
208      59
60       59
58       59
325      59
390      59
324      59
564     564
123     564
21      21

I want to join both of these based on the second half of the values in df1 i.e. the values after the hifen, for example 12--99 joins old id 99 in df2 and 4-21 to old id 21.

The final new output dataframe should join to the new values in df2 and look like-

df3
OutputValues     OutputValues2
12-99               12-191
22-99               22-191
264-99              264-191
12-323,138-431      12-323,138-59
4-21                4-21
12-123,4-325        12-564,4-59

As you see, now the first part of the concatenation is joined with the new id in my desired final output dataframe df3 where there is 99 it is replaced with 191, 123 is replaced with 564 and 325 with 59,etc

How can i do this?

Upvotes: 0

Views: 46

Answers (2)

wwnde
wwnde

Reputation: 26676

df1=df1['OutputValues'].str.split(',').explode().str.split('\-',expand=True).join(df1)#Separate explode to separate OutputValues and join them back to df1
df3=df2.astype(str).merge(g, left_on='OldId', right_on=1)#merge df2 and new df1
df3=df3.assign(OutputValues2=df3[0].str.cat(h.NewId, sep='-')).drop(columns=['OldId','NewId',0,1])#Create OutputValues2 and drop unrequired columns
df3.groupby('OutputValues')['OutputValues2'].agg(','.join).reset_index()





     OutputValues OutputValues2
0          12-123        12-564
1  12-323,138-431  12-84,138-59
2           12-99        12-191
3           22-99        22-191
4          264-99       264-191
5            4-21          4-21

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

Let's extract both parts, map the last part then concatenate back:

s = df1.OutputValues.str.extractall('(\d+-)(\d+)');
df1['OutputValues2'] = (s[0]+s[1].map(df2.astype(str).set_index('OldId')['NewId'])
).groupby(level=0).agg(','.join)

Output:

     OutputValues OutputValues2
0           12-99        12-191
1           22-99        22-191
2          264-99       264-191
3  12-323,138-431  12-84,138-59
4            4-21          4-21
5          12-123        12-564

Update: Looks like simple replace would also work, but this might fail in some edge cases:

df1['OutputValues2'] = df1.OutputValues.replace(('-'+df2.astype(str))
                                                     .set_index('OldId')['NewId'], 
                                                 regex=True)

Upvotes: 1

Related Questions