Lynn
Lynn

Reputation: 4388

Match certain values between two separate datasets and create new columns using python

I have two datasets, df1 and df2. If the source and date value in df1, match the id and date column in df2, I would like to append two new columns 'source count' and 'source pw' creating a new dataset. The source count is created by grouping the source values and taking the sum of the 'pw' column and then taking the count of the 'role' column.

Data

df1

source  pw  role    leaving date 
aa      10  hello   bb      q122
aa      10  hello   bb      q122
bb      5   bye     aa      q222
bb      1   ok      aa      q322

df2

id  date
aa  q122
aa  q222
aa  q322
bb  q122
bb  q222
bb  q322
bb  q422

Desired

id  date    source count    source pw
aa  q1 22   2               20
aa  q2 22   0               0
aa  q3 22   0               0
bb  q1 22   0               0
bb  q2 22   1               5
bb  q3 22   1               1
bb  q4 22   0               0

Logic

For the source 'aa' values, we see values 10 and 10 under the 'pw' column and this has a count of 2. The source is aa and the date is q1 22. These values are then appended to df2.

Doing

I am first creating a dictionary to match certain values #I am not sure if this is the correct approach...

mapping

df2['source count']=df2['aa'].map({
'q1 22':'' })

Take a groupby sum and count

a= df1.groupby(['source'])['pw'].sum()
a = df1.groupby(['source'])['role'].cumcount()

append dataset

pd.concat([df1,df2], axis=1) 

I am still troubleshooting, any suggestion is appreciated.

Upvotes: 2

Views: 225

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24304

use groupby()+agg() for calculation then merge the result by merge() method:

df1['date']=df1['date'].str[:2]+' '+df1['date'].str[2:]
out=(df1.groupby(['source','date'])['pw'].agg(['count','sum'])
        .reset_index()
        .rename(columns={'source':'id','count':'source count','sum':'source pw'}))
out=df2.merge(out,on=['id','date'],how='left').fillna(0,downcast='infer')

output of out:

    id  date    source count    source pw
0   aa  q1 22   2               20
1   aa  q2 22   0               0
2   aa  q3 22   0               0
3   bb  q1 22   0               0
4   bb  q2 22   1               5
5   bb  q3 22   1               1
6   bb  q4 22   0               0

Upvotes: 1

Related Questions