Jeff Saltfist
Jeff Saltfist

Reputation: 943

Pandas - Merging Different Sized DataFrames

I am having an issue merging two frames with a different amount of rows. The first dataframe has 5K rows, and the second dataframe has 20K rows. There is a column "id" in both frames, and all 5K "id" values will occur in the frame with 20K rows.

first frame "df"

     A    B    id    A_1    B_1
0    1    1    1     0.5    0.5
1    3    2    2     0.2    0.4
2    3    4    3     0.8    0.9

second frame "df_2"

     A    B    id    
0    1    1    1    
1    3    2    2    
2    3    4    3    
3    1    2    4    
4    3    1    5     

Hopeful output frame "df_out"

     A    B    id    A_1    B_1
0    1    1    1     0.5    0.5
1    3    2    2     0.2    0.4
2    3    4    3     0.8    0.9
3    1    2    4     na     na
4    3    1    5     na     na

My attempts to merge on 'id' have left me with only the 5k rows. The operation I am seeking is to preserve all the rows of the large dataframe, and stick Nan values for the data that does not exist in the large frame.

Thanks

Upvotes: 1

Views: 123

Answers (1)

miradulo
miradulo

Reputation: 29680

Just specify how=outer to df.merge so that you use the union of both DataFrames.

>>> df.merge(df_2, how='outer')
     A  A_1    B  B_1   id
0  1.0  0.5  1.0  0.5  1.0
1  3.0  0.2  2.0  0.4  2.0
2  3.0  0.8  4.0  0.9  3.0
3  1.0  NaN  2.0  NaN  4.0
4  3.0  NaN  1.0  NaN  5.0

Upvotes: 3

Related Questions