Reputation: 67
I have a dataframeA
ID-A ID'-A Type Date
1 2 A 08/2020
3 2 A 09/2020
3 6 A 09/2020
2 4 A 09/2020
2 8 A 12/2020
and a dataframeB
ID-B ID'-B Type Date
5 2 B 09/2020
6 3 B 10/2020
8 1 B 10/2020
4 10 B 11/2020
i want this output : iterate over DataframeB by date and if ID'-B = ID-A copy the row next
ID-A ID'-A Type Date ID-B ID'-B Type Date
1 2 A 08/2020 8 1 B 10/2020
3 2 A 09/2020 6 3 B 10/2020
3 6 A 09/2020 Nan Nan Nan Nan
2 4 A 09/2020 5 2 B 09/2020
2 8 A 12/2020 Nan Nan Nan Nan
Nan Nan Nan Nan 4 10 B 11/2020
I tried to use the merge with keys :
out = pd.merge(DataframeA ,DataframeB , how='left' , left_on =['ID-A'] , right_on =['ID'-B'])
but it give me a duplicate rows
ID-A ID'-A Type Date ID-B ID'-B Type Date
1 2 A 08/2020 8 1 B 10/2020
3 2 A 09/2020 6 3 B 10/2020
3 6 A 09/2020 6 3 B 10/2020
2 4 A 09/2020 5 2 B 09/2020
2 8 A 12/2020 5 2 B 09/2020
Nan Nan Nan Nan 4 10 B 11/2020
Upvotes: 0
Views: 314
Reputation: 16147
You're going to have to rename your second dataframes columns, otherwise you're going to end up with duplicate columns and the merge with suffix them with _x
and _y
.
So you need to rename your dfb
column first.
Then you can use duplicated
to select only the first records from dfa
to merge on, then concat
to add the duplicates back after the merge.
import pandas as pd
dfa = pd.DataFrame({'ID-A': {0: 1, 1: 3, 2: 3, 3: 2, 4: 2},
"ID'-A": {0: 2, 1: 2, 2: 6, 3: 4, 4: 8},
'Type': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A'},
'Date': {0: '08/2020',
1: '09/2020',
2: '09/2020',
3: '09/2020',
4: '12/2020'}})
dfb = pd.DataFrame({'ID-B': {0: 5, 1: 6, 2: 8, 3: 4},
"ID'-B": {0: 2, 1: 3, 2: 1, 3: 10},
'Type': {0: 'B', 1: 'B', 2: 'B', 3: 'B'},
'Date': {0: '09/2020', 1: '10/2020', 2: '10/2020', 3: '11/2020'}})
dfb.columns = ['ID-B',"ID'-B",'Type-B','Date-B']
pd.concat([dfa[~dfa.duplicated(subset=['ID-A'])].merge(dfb, left_on=['ID-A'], right_on=["ID'-B"], how='left'),
dfa[dfa.duplicated(subset=['ID-A'])]],axis=0)
Output
ID-A ID'-A Type Date ID-B ID'-B Type-B Date-B
0 1 2 A 08/2020 8.0 1.0 B 10/2020
1 3 2 A 09/2020 6.0 3.0 B 10/2020
2 2 4 A 09/2020 5.0 2.0 B 09/2020
2 3 6 A 09/2020 NaN NaN NaN NaN
4 2 8 A 12/2020 NaN NaN NaN NaN
Upvotes: 1