immb31
immb31

Reputation: 75

Merge dataframes

I'm trying to merge this two dataframes:

df1=  
     pais   ano  cantidad
 0  Chile  2000        10
 1  Chile  2001        11
 2  Chile  2002        12

df2=
     pais   ano  cantidad
 0  Chile  1999         0
 1  Chile  2000         0
 2  Chile  2001         0
 3  Chile  2002         0
 4  Chile  2003         0

I'm trying to merge df1 into df2 and replace the existing año rows with those from df1. This is the code that I'm trying right now and what I'm getting:

df=df1.combine_first(df2)

df=
    pais    ano     cantidad
0   Chile   2000.0  10.0
1   Chile   2001.0  11.0
2   Chile   2002.0  12.0
3   Chile   2002.0  0.0
4   Chile   2003.0  0.0

As you can see, row corresponding to 1999 is missing and the one for 2002 with 'cantidad'= 0 shoudn't be there. My desired output is this:

df=
    pais    ano     cantidad
0   Chile   1999    0
1   Chile   2000    10
2   Chile   2001    11
3   Chile   2002    12
4   Chile   2003    0

Any ideas? Thank you!

Upvotes: 1

Views: 64

Answers (2)

Michael Szczesny
Michael Szczesny

Reputation: 5036

You can perform a left join on df2 and fillna missing values from df2.cantidad. I'm joining on pais and ano because I assume in your real dataframe are more countries than 'chile'.

df = df2[['pais','ano']].merge(df1, on=['pais','ano'], how='left').fillna({'cantidad': df2.cantidad})
df.cantidad = df.cantidad.astype('int')
df

Out:

    pais   ano  cantidad
0  Chile  1999         0
1  Chile  2000        10
2  Chile  2001        11
3  Chile  2002        12
4  Chile  2003         0

Upvotes: 0

Roim
Roim

Reputation: 3066

Add how='outer param to the merge.

By default, merge works with "inner", which means it takes only values which are in both dataframe (intersection) while you want union of those sections.

Also, you may want to add on="ano" to declare on which column you want to merge. It may not be needed on your case, but it's worth to check it out.

Please check Pandas Merging 101 for more details

Upvotes: 1

Related Questions