rescot
rescot

Reputation: 335

How to map pandas Groupby dataframe with sum values to another dataframe using non-unique column

I have two pandas dataframe df1 and df2. Where i need to find df1['seq'] by doing a groupby on df2 and taking the sum of the column df2['sum_column']. Below are sample data and my current solution.

df1

id    code amount  seq
234     3    9.8    ?
213     3    18
241     3    6.4
543     3    2
524     2    1.8
142     2    14
987     2    11
658     3    17

df2

c_id  name role    sum_column
1     Aus  leader    6
1     Aus  client    1
1     Aus  chair     7
2     Ned  chair     8
2     Ned  leader    3
3     Mar  client    5
3     Mar  chair     2
3     Mar  leader    4

grouped = df2.groupby('c_id')['sum_column'].sum()
df3 = grouped.reset_index()

df3

c_id  sum_column
 1      14
 2      11
 3      11

The next step where am having issues is to map the df3 to df1 and conduct a conditional check to see if df1['amount'] is greater then df3['sum_column'].

df1['seq'] = np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')[sum_column]), 1, 0)

printing out df1['code'].map(df3.set_index('c_id')['sum_column']), I get only NaN values.

Does anyone know what am doing wrong here?

Expected results: df1

id    code amount  seq
234     3    9.8    0
213     3    18     1
241     3    6.4    0
543     3    2      0
524     2    1.8    0
142     2    14     1
987     2    11     0
658     3    17     1

Upvotes: 1

Views: 1826

Answers (2)

BENY
BENY

Reputation: 323356

You forget add quote for sum_column

df1['seq']=np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')['sum_column']), 1, 0)

Upvotes: 1

jezrael
jezrael

Reputation: 863301

Solution should be simplify with remove .reset_index() for df3 and pass Series to map:

s = df2.groupby('c_id')['sum_column'].sum()
df1['seq'] = np.where(df1['amount'] > df1['code'].map(s), 1, 0)

Alternative with casting boolean mask to integer for True, False to 1,0:

df1['seq'] = (df1['amount'] > df1['code'].map(s)).astype(int)

print (df1)
    id  code  amount  seq
0  234     3     9.8    0
1  213     3    18.0    1
2  241     3     6.4    0
3  543     3     2.0    0
4  524     2     1.8    0
5  142     2    14.0    1
6  987     2    11.0    0
7  658     3    17.0    1

Upvotes: 3

Related Questions