Reputation:

Combine same columns in a dataframe

I have two dataframes

First dataframe, df1

    UserMasterId    Status  Count
  0 1296.0               5  5
  1 1316.0               5  9
  2 1325.0               5  14
  3 1332.0               5  5
  4 1337.0               5  44
  5 1342.0               5  2
  6 1344.0               5  18

Second dataframe, df2

    UserMasterId    Status  Count
  0 1325.0               0  2
  1 1332.0               0  1
  2 1337.0               0  1
  3 1342.0               0  3
  4 1344.0               0  1

Here we have same ID in both dataframes..

When I use Concat,

result = pd.concat([df1, df2], axis=1, sort=True)
result

The output I get is,

    UserMasterId    Status  Count   UserMasterId    Status  Count
0   1296.0               5      5         1325.0       0.0    2.0
1   1316.0               5      9         1332.0       0.0    1.0
2   1325.0               5     14         1337.0       0.0    1.0
3   1332.0               5      5         1342.0       0.0    3.0
4   1337.0               5     44         1344.0       0.0    1.0
5   1342.0               5      2            NaN       NaN    NaN
6   1344.0               5     18            NaN       NaN    NaN

When i use Merge,

result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId')
result.head()

The output is,

    UserMasterId    Status_x    Count_x Count_y Status_y
0         1325.0           5         14      2         0
1         1332.0           5          5      1         0
2         1337.0           5         44      1         0
3         1342.0           5          2      3         0
4         1344.0           5         18      1         0

This removes the ID that is not common in df1 & df2.

I don't want to remove the ID that is not common to both dataframes. I want the output like this,

UserMasterId    Status_x    Count_x Count_y Status_y
0     1296.0           5          5      NA       NA
1     1316.0           5          9      NA       NA
2     1325.0           5         14      2         0
3     1332.0           5          5      1         0
4     1337.0           5         44      1         0
5     1342.0           5          2      3         0
6     1344.0           5         18      1         0

Can anyone help me out?

Upvotes: 1

Answers (4)

NYC Coder

Reputation: 7604

Just use how='outer'

result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId', how='outer')
print(result)

   UserMasterId  Status_x  Count_x  Count_y  Status_y
0        1296.0         5        5      NaN       NaN
1        1316.0         5        9      NaN       NaN
2        1325.0         5       14      2.0       0.0
3        1332.0         5        5      1.0       0.0
4        1337.0         5       44      1.0       0.0
5        1342.0         5        2      3.0       0.0
6        1344.0         5       18      1.0       0.0

Upvotes: 0

dheinz

Reputation: 1096

Using merge with an outer join should do it was well, right? I did not test it however

result = pd.merge(df1,df2[['UserMasterId', 'Count','Status']],on='UserMasterId', how='outer')

Upvotes: 1

Thomas Kok

Reputation: 91

pd.concat is the solution here, but you need to add the ignore_index parameter and not set axis to 1, as such:

>>> df1 = pd.DataFrame({'a': [0, 1], 'b': [2, 3]})
>>> df2 = pd.DataFrame({'b': [4, 5], 'a': [5, 6]})
>>> pd.concat([df1, df2], ignore_index=True)
   a  b
0  0  2
1  1  3
2  5  4
3  6  5

Upvotes: 0

YannTC

Reputation: 33

You can try to use the method append :

df3 = df1.append(df2)

Upvotes: 0

Combine same columns in a dataframe

Answers (4)

Related Questions