Reputation: 1352
I am trying to combine two dataframes with each other. But I am stuck on the logic (maybe I looked too long to the problem).
Problem: DF1 is in a format like this:
Country ID March April May June
Netherlands A 10 20 15 14
Netherlands B 15 18 10 4
Germany C 9 8 3 1
Then, I have another dataframe:
DF2
Country ID Month Number
Netherlands A March 8
Netherlands A March 28
Netherlands B April 18
Germany C Jan 9
I would like to merge these two columns (in the same format as DF1). My assumption was that is should be something like, group by (Country + ID), and then (Month as column, and SUM the Number). - but as said before, I am confussed.
Desired output:
Country ID March April May June df2_Jan df2_March df2_April
Netherlands A 10 20 15 14 0 36 #8+28 0
Netherlands B 15 18 10 4 0 0 18
Germany C 9 8 3 1 9 0 0
Hope that you can help me. Thanks a lot in advance!
Upvotes: 1
Views: 393
Reputation: 11405
Summing in DF2 indeed is a simple groupby().sum()
operation
df2 = df2.groupby(['Country', 'ID', 'Month'])['Number'].sum()
You now have a series with 3-level index.
It seems to make DF2 to the same format as DF1 you only need a single .unstack()
which pivots one of the index levels to columns.
df2 = df2.unstack('Month', fill_value=0)
Merging in same format is now easy
df1.merge(df2.add_prefix('df2_').reset_index(), on=['Country', 'ID'])
Steps 1 and 2 can be replace with a single call to .pivot_table
with the aggfunc
parameter, as per @Nk03’s answer.
Upvotes: 1
Reputation: 14949
You can try:
merged_df = (
df1.merge(
df2.pivot_table(
index=['Country', 'ID'],
columns='Month',
values='Number',
aggfunc=sum,
fill_value=0)
.add_prefix('df2_')
.reset_index(),
on=['Country', 'ID']
)
)
OUTPUT:
Country ID March April May June df2_April df2_Jan df2_March
0 Netherlands A 10 20 15 14 0 0 36
1 Netherlands B 15 18 10 4 18 0 0
2 Germany C 9 8 3 1 0 9 0
Upvotes: 1