Reputation:
I have two dataframes
First dataframe, df1
UserMasterId Status Count
0 1296.0 5 5
1 1316.0 5 9
2 1325.0 5 14
3 1332.0 5 5
4 1337.0 5 44
5 1342.0 5 2
6 1344.0 5 18
Second dataframe, df2
UserMasterId Status Count
0 1325.0 0 2
1 1332.0 0 1
2 1337.0 0 1
3 1342.0 0 3
4 1344.0 0 1
Here we have same ID in both dataframes..
When I use Concat,
result = pd.concat([df1, df2], axis=1, sort=True)
result
The output I get is,
UserMasterId Status Count UserMasterId Status Count
0 1296.0 5 5 1325.0 0.0 2.0
1 1316.0 5 9 1332.0 0.0 1.0
2 1325.0 5 14 1337.0 0.0 1.0
3 1332.0 5 5 1342.0 0.0 3.0
4 1337.0 5 44 1344.0 0.0 1.0
5 1342.0 5 2 NaN NaN NaN
6 1344.0 5 18 NaN NaN NaN
When i use Merge,
result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId')
result.head()
The output is,
UserMasterId Status_x Count_x Count_y Status_y
0 1325.0 5 14 2 0
1 1332.0 5 5 1 0
2 1337.0 5 44 1 0
3 1342.0 5 2 3 0
4 1344.0 5 18 1 0
This removes the ID that is not common in df1 & df2.
I don't want to remove the ID that is not common to both dataframes. I want the output like this,
UserMasterId Status_x Count_x Count_y Status_y
0 1296.0 5 5 NA NA
1 1316.0 5 9 NA NA
2 1325.0 5 14 2 0
3 1332.0 5 5 1 0
4 1337.0 5 44 1 0
5 1342.0 5 2 3 0
6 1344.0 5 18 1 0
Can anyone help me out?
Upvotes: 1
Views: 56
Reputation: 7604
Just use how='outer'
result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId', how='outer')
print(result)
UserMasterId Status_x Count_x Count_y Status_y
0 1296.0 5 5 NaN NaN
1 1316.0 5 9 NaN NaN
2 1325.0 5 14 2.0 0.0
3 1332.0 5 5 1.0 0.0
4 1337.0 5 44 1.0 0.0
5 1342.0 5 2 3.0 0.0
6 1344.0 5 18 1.0 0.0
Upvotes: 0
Reputation: 1096
Using merge with an outer join should do it was well, right? I did not test it however
result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId', how='outer')
Upvotes: 1
Reputation: 91
pd.concat
is the solution here, but you need to add the ignore_index
parameter and not set axis
to 1
, as such:
>>> df1 = pd.DataFrame({'a': [0, 1], 'b': [2, 3]})
>>> df2 = pd.DataFrame({'b': [4, 5], 'a': [5, 6]})
>>> pd.concat([df1, df2], ignore_index=True)
a b
0 0 2
1 1 3
2 5 4
3 6 5
Upvotes: 0