Reputation: 83
I cant get my head around Merge and joins.
I have an initial data frame, that looks like below:
Car Units Month
BMW 800 Jan
Lexus 150 Jan
Audi 210 Feb
Mustang 90 Mar
Bentley 110 Feb
Jaguar 90 Jan
I will successively get more dataframes to append to this initial df.
And I will have to append only those values which are unique in 'Car' column and also 'Month' column.
Let's say I got another dataframe to append:
Car Units Month
BMW 130 Jan
Tata 150 Jan
Maruti 110 Mar
Audi 250 Mar
Desired output: When I append 2nd dataframe to the first one, 'Tata', 'Maruti', 'Audi' should get appended. 'BMW' should not get appended.
The code I am using is:
import pandas as pd
df1 = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],"Units": [800, 150, 210, 90, 110, 90], "Month": ['Jan','Jan', 'Feb', 'Mar', 'Feb', 'Jan']
}
)
df2 = pd.DataFrame(
{
"Car": ['BMW', 'Tata', 'Maruti', 'Audi'],"Units": [130, 150, 110, 250], "Month": ['Jan','Jan','Mar', 'Mar']
}
)
df1 = pd.merge(df1, df2, on=['Car', 'Month'], how="left", indicator=False)
But somehow it doesn't work.
Upvotes: 0
Views: 76
Reputation: 2598
Using "left" will only affect to using just the keys on the left for ['Car', 'Month']
.
One solution is to use outer
to have all the combinations of ['Car', 'Month']
keys. I set the indicator to true the origin of the value and also the left to have no suffix so we can take advantage of this later:
joined=pd.merge(df1, df2, on=['Car', 'Month'],
how="outer", indicator=True, suffixes=["", "_y"])
>> joined
Car Units Month Units_y _merge
0 BMW 800.0 Jan 130.0 both
1 Lexus 150.0 Jan NaN left_only
2 Audi 210.0 Feb NaN left_only
3 Mustang 90.0 Mar NaN left_only
4 Bentley 110.0 Feb NaN left_only
5 Jaguar 90.0 Jan NaN left_only
6 Tata NaN Jan 150.0 right_only
7 Maruti NaN Mar 110.0 right_only
8 Audi NaN Mar 250.0 right_only
Selecting the units columns:
units = joined.filter(regex="^Units", axis=1)
>> units
Units Units_y
0 800.0 130.0
1 150.0 NaN
2 210.0 NaN
3 90.0 NaN
4 110.0 NaN
5 90.0 NaN
6 NaN 150.0
7 NaN 110.0
8 NaN 250.0
Then, since you want the results on the left
to take precedence, you could fill the Nan
in the horizontal direction, so if there are any values on Units
(the value coming from the left), they will be preserved:
filled_units = units.fillna(axis=1, method='bfill')
>> filled_units
Units Units_y
0 800.0 130.0
1 150.0 NaN
2 210.0 NaN
3 90.0 NaN
4 110.0 NaN
5 90.0 NaN
6 150.0 150.0
7 110.0 110.0
8 250.0 250.0
Finally, we can slice filled_units and combine it with the values for Car
and Month
in joined
:
pd.concat([joined[['Car', 'Month']], filled_units["Units"]], axis=1)
Car Month Units
0 BMW Jan 800.0
1 Lexus Jan 150.0
2 Audi Feb 210.0
3 Mustang Mar 90.0
4 Bentley Feb 110.0
5 Jaguar Jan 90.0
6 Tata Jan 150.0
7 Maruti Mar 110.0
8 Audi Mar 250.0
Hope it helps
Upvotes: 1