Reputation: 147
So I have these 2 dataframes
df1
title URL number date
a /url-1 1 21-02-2020
a /url-1 10 20-02-2020
a /url-1 17 19-02-2020
b /url-2 100 21-02-2020
b /url-2 106 20-02-2020
df2
URL number date
/url-1 5 21-02-2020
/url-1 12 20-02-2020
/url-1 50 19-02-2020
/url-3 9 21-02-2020
/url-3 11 20-02-2020
So I need to perform these actions to combine them into 1 dataframe:
1) Outer join (correct me if I'm wrong) df1 and df2 but I don't know how to show the results in 4 columns as I need to group by "URL" and "date" columns (will be illustrated below).
2) Aggregate df1["number"] and df2["number"] and sum them together
This is the result I want:
new_df
title URL number date
a /url-1 6 21-02-2020
a /url-1 22 20-02-2020
a /url-1 67 19-02-2020
b /url-2 100 21-02-2020
b /url-2 106 20-02-2020
null /url-3 9 21-02-2020
null /url-3 11 20-02-2020
Your help is much appreciated! :)
Upvotes: 1
Views: 40
Reputation: 28709
use pandas merge, set the how option as outer, and merge both dataframes on URL and date.
Sum the two numbers and keep only the columns that are relevant.
M = df1.merge(df2,how='outer', on=['URL','date'])
M['number'] = M.number_x.fillna(0).add(M.number_y.fillna(0))
M.filter(['title','URL','number','date'])
title URL number date
0 a /url-1 6.0 21-02-2020
1 a /url-1 22.0 20-02-2020
2 a /url-1 67.0 19-02-2020
3 b /url-2 100.0 21-02-2020
4 b /url-2 106.0 20-02-2020
5 NaN /url-3 9.0 21-02-2020
6 NaN /url-3 11.0 20-02-2020
Upvotes: 1