Reputation: 89
I would like to join two files by key but I would like some columns to be joined together for example:
File1:
df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
'Color' : ['red', 'red', 'blue','yellow', 'red']})
List Color
0 P111 red
1 P999 red
2 P111;P999;P777 blue
3 P555 yellow
4 P666;P111;P333 red
File2:
df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})
Cod Animal Letter
0 P111 DOG A,F
1 P222 CAT C
2 P333 BUG S,M
3 P444 SNAKE,DOG F,L
4 P555 CAT,BUG C,A
5 P666 DOG M,C
6 P777 SNAKE Z,L
I would like to merge with primary key List (from file1) and Cod (from file2), to get:
List Color Animal Letter
0 P111 red DOG,FROG A,F
1 P999 red - -
2 P111;P999;P777 blue DOG,FROG|-|SNAKE A,F|-|Z,L
3 P555 yellow CAT,BUG C,A
4 P666;P111;P333 red DOG|DOG,FROG|BUG M,C|A,F|S,M
I think we need something like a left join and an agragation but I don't know how. In the final table I would like that in the values not found there was a - . While I would like a | to separate values in "aggregated" columns
Which is the best way?
Upvotes: 1
Views: 102
Reputation: 863166
Idea is use DataFrame.explode
by splitted values of List
, then use left join and aggregate first values with join
for expected ouput:
df = (df1.assign(Cod = df1['List'].str.split(';'))
.explode('Cod')
.reset_index()
.merge(df2, how='left', on='Cod')
.fillna('-')
.groupby('index')
.agg(List=('List','first'),
Color=('Color','first'),
Animal=('Animal','|'.join),
Letter=('Letter','|'.join))
.rename_axis(None))
print (df)
List Color Animal Letter
0 P111 red DOG A,F
1 P999 red - -
2 P111;P999;P777 blue DOG|-|SNAKE A,F|-|Z,L
3 P555 yellow CAT,BUG C,A
4 P666;P111;P333 red DOG|DOG|BUG M,C|A,F|S,M
Upvotes: 3
Reputation: 53
Please specify if you want to perform join or not. Since you didn't mention any primary key. I am assuming you want to concatenate.
import pandas as pd
df1 = pd.read_excel(pwd + '/Book1.xlsx')
df2 = pd.read_excel(pwd + '/Book2.xlsx')
df_combined = pd.concat([df1,df2], ignore_index=True)
print(df_combined)
list color cod animal
0 p111 red NaN NaN
1 p222 blue NaN NaN
2 p333 green NaN NaN
3 p444 yellow NaN NaN
4 p555 NaN NaN NaN
5 NaN NaN p111 rat
6 NaN NaN p222 cat
7 NaN NaN p333 bat
8 NaN NaN p444 hat
9 NaN NaN p555 cool
Result will be like above-
Also please provide sample data so that it can be in copy-pasted.
Upvotes: 0