democracii
democracii

Reputation: 89

Using SUMIF Function in Pandas

I'm trying to calculate Datafame value with Pandas.

But I couldn't calculate "FW_result" and "SS_result".

df1:
category  num  standard
  FW      U-1    3
  FW      U-2    3
  FW      U-3    2
  SS      U-4    3
  SS      U-5    2


df2:
name    U-1 U-2 U-3 U-4 U-5
 A       3   1   1   2   1
 B       2   3   2   1   2
 C       2   2   2   3   1


The desired result would be: 

name    U-1 U-2 U-3 U-4 U-5 FW_result   SS_Result
 A       3   1   1   2   1     *(63%)       #(60%)
 B       2   3   2   1   2     **(88%)      ##(60%)
 C       2   2   2   3   1     ***(75%)     ###(80%)

I want to fill the * and # values. The FW_result refers to df1 So, (df2 value)/(standard sum)

Example *:

(3+1+1)/(3+3+2) =>63%

Example #:

(2+1) /(3+2)   => 60%

and I want to express the result to %(persent)

Upvotes: 2

Views: 170

Answers (3)

Ch3steR
Ch3steR

Reputation: 20669

If nums don't overlap in categories you can use this.

You can use df.groupby here, then use use GroupBy.sum, then divide those values using df.div and multiply with 100 to get percentage values using df.mul to get desired results.

g = df1.groupby('category')['standard'].sum()
m = df2.groupby(df2.columns.map(df1.set_index('num')['category']),axis=1).sum()
               # Mapping to convert U-1 to FW, U-2 to FW based on df1
               # If you know it's always the order as df1 then you can simply
               #_.groupby.(df1['category'].tolist(), axis=1)

df2[['FW_res', 'SS_res']] = m.div(g).mul(100)

If they overlap then try this.

df1
#  category  num  standard
#0       FW  U-1         3
#1       FW  U-2         3
#2       FW  U-3         2
#3       SS  U-4         3
#4       SS  U-5         2
#5       SS  U-1         4
p = df1.pivot_table(index='category', columns='num', values='standard', aggfunc='sum')
p
#num       U-1  U-2  U-3  U-4  U-5
#category
#FW        3.0  3.0  2.0  NaN  NaN
#SS        4.0  NaN  NaN  3.0  2.0

# Now, little bit of Numpy boardcasting.
#df2.se_index('name') # if index is not name else ignore this step.
vs = np.nansum(df2.values[:,None]+p.values, axis=-1)
v = p.sum(axis=1).values
out = (vs - v) / v
df2[['FW_res', 'SS_res']] = pd.DataFrame(out*100, index=df2.index)

Upvotes: 2

jezrael
jezrael

Reputation: 862751

For general solution also with overlapping values in num for both new columns use:

Idea is filter df1 for fw and ss values, then use DataFrame.reindex by num values, sum and divide by sum of original standard columns:

df3 = df1.groupby('category').agg({'num':list, 'standard':'sum'})
print (df3)
                      num  standard
category                           
FW        [U-1, U-2, U-3]         8
SS             [U-4, U-5]         5

for col in df3.index:
    df2[f'{col}_result'] = (df2.reindex(df3.loc[col, 'num'], axis=1).sum(axis=1)
                               .div(df3.loc[col, 'standard']).mul(100))
print (df2)
  name  U-1  U-2  U-3  U-4  U-5  FW_result  SS_result
0    A    3    1    1    2    1       62.5       60.0
1    B    2    3    2    1    2       87.5       60.0
2    C    2    2    2    3    1       75.0       80.0

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28679

You could try a combination of melt, merge and groupby to reshape the data and get the sums :

df2[['FW_result', 'SS_result']] = (df1.merge(df2.melt("name", var_name="num"), on="num")
                                   .assign(num=lambda x: x.num.str[0])
                                   .groupby(["name", "category", "num"])
                                   .pipe(lambda x: x.value.sum() / x.standard.sum())
                                   .mul(100)
                                   .unstack("category").to_numpy())


  name  U-1 U-2 U-3 U-4 U-5 FW_result   SS_result
0   A   3   1   1   2   1   62.5        60.0
1   B   2   3   2   1   2   87.5        60.0
2   C   2   2   2   3   1   75.0        80.0

Upvotes: 0

Related Questions