Reputation: 75
I'm trying to merge this two dataframes:
df1=
pais ano cantidad
0 Chile 2000 10
1 Chile 2001 11
2 Chile 2002 12
df2=
pais ano cantidad
0 Chile 1999 0
1 Chile 2000 0
2 Chile 2001 0
3 Chile 2002 0
4 Chile 2003 0
I'm trying to merge df1 into df2 and replace the existing año rows with those from df1. This is the code that I'm trying right now and what I'm getting:
df=df1.combine_first(df2)
df=
pais ano cantidad
0 Chile 2000.0 10.0
1 Chile 2001.0 11.0
2 Chile 2002.0 12.0
3 Chile 2002.0 0.0
4 Chile 2003.0 0.0
As you can see, row corresponding to 1999 is missing and the one for 2002 with 'cantidad'= 0 shoudn't be there. My desired output is this:
df=
pais ano cantidad
0 Chile 1999 0
1 Chile 2000 10
2 Chile 2001 11
3 Chile 2002 12
4 Chile 2003 0
Any ideas? Thank you!
Upvotes: 1
Views: 64
Reputation: 5036
You can perform a left join
on df2
and fillna
missing values from df2.cantidad
. I'm joining on pais
and ano
because I assume in your real dataframe are more countries than 'chile'.
df = df2[['pais','ano']].merge(df1, on=['pais','ano'], how='left').fillna({'cantidad': df2.cantidad})
df.cantidad = df.cantidad.astype('int')
df
Out:
pais ano cantidad
0 Chile 1999 0
1 Chile 2000 10
2 Chile 2001 11
3 Chile 2002 12
4 Chile 2003 0
Upvotes: 0
Reputation: 3066
Add how='outer
param to the merge.
By default, merge
works with "inner", which means it takes only values which are in both dataframe (intersection) while you want union of those sections.
Also, you may want to add on="ano"
to declare on which column you want to merge. It may not be needed on your case, but it's worth to check it out.
Please check Pandas Merging 101 for more details
Upvotes: 1