Reputation: 4388
I have two datasets, df1 and df2. If the source and date value in df1, match the id and date column in df2, I would like to append two new columns 'source count' and 'source pw' creating a new dataset. The source count is created by grouping the source values and taking the sum of the 'pw' column and then taking the count of the 'role' column.
Data
df1
source pw role leaving date
aa 10 hello bb q122
aa 10 hello bb q122
bb 5 bye aa q222
bb 1 ok aa q322
df2
id date
aa q122
aa q222
aa q322
bb q122
bb q222
bb q322
bb q422
Desired
id date source count source pw
aa q1 22 2 20
aa q2 22 0 0
aa q3 22 0 0
bb q1 22 0 0
bb q2 22 1 5
bb q3 22 1 1
bb q4 22 0 0
Logic
For the source 'aa' values, we see values 10 and 10 under the 'pw' column and this has a count of 2. The source is aa and the date is q1 22. These values are then appended to df2.
Doing
I am first creating a dictionary to match certain values #I am not sure if this is the correct approach...
mapping
df2['source count']=df2['aa'].map({
'q1 22':'' })
Take a groupby sum and count
a= df1.groupby(['source'])['pw'].sum()
a = df1.groupby(['source'])['role'].cumcount()
append dataset
pd.concat([df1,df2], axis=1)
I am still troubleshooting, any suggestion is appreciated.
Upvotes: 2
Views: 225
Reputation: 24304
use groupby()
+agg()
for calculation then merge the result by merge()
method:
df1['date']=df1['date'].str[:2]+' '+df1['date'].str[2:]
out=(df1.groupby(['source','date'])['pw'].agg(['count','sum'])
.reset_index()
.rename(columns={'source':'id','count':'source count','sum':'source pw'}))
out=df2.merge(out,on=['id','date'],how='left').fillna(0,downcast='infer')
output of out
:
id date source count source pw
0 aa q1 22 2 20
1 aa q2 22 0 0
2 aa q3 22 0 0
3 bb q1 22 0 0
4 bb q2 22 1 5
5 bb q3 22 1 1
6 bb q4 22 0 0
Upvotes: 1