Bishonen_PL
Bishonen_PL

Reputation: 1571

DataFrame arithmetic based on subset

Having a dataframe similar to:

Country Column1 Product Week        Val
UK      S1      A       2019-36     10
UK      S1      A       2019-37     20
UK      S1      A       2019-38     30
UK      S1      B       2019-36     30
UK      S1      B       2019-37     30
UK      S1      B       2019-38     30
DE      S1      A       2019-39     100
DE      S1      A       2019-40     100
DE      S1      A       2019-41     100
DE      S1      B       2019-36     10
DE      S1      B       2019-37     15
DE      S1      B       2019-38     10

How does one go about saying: If Product = "B" then take the VAL from product "A" where all other columns are the same (Country, Column1 & Week) and add 50% of that Val to the current value?

E.g. The first "B" would have a value of 35:

30 + (50%*10)

The second one 40:

30 + (50%*20)

and the third one 45:

30 + (50%*30)

Upvotes: 0

Views: 68

Answers (2)

aafulei
aafulei

Reputation: 2225

What about playing with indices?

Suppose you have your data in a pandas.DataFrame called data

data = data.set_index(["Country", "Column1", "Week", "Product"], drop=False)
df1 = data[data.Product == "A"].set_index(["Country", "Column1", "Week"], drop=False)
df2 = data[data.Product == "B"].set_index(["Country", "Column1", "Week"], drop=False)
df2.Val += df1.Val * .5  # so that rows with all else the same would add
df2 = df2.set_index(["Country", "Column1", "Week", "Product"])
data.update(df2)
data["Index"] = range(len(data.Val))
data = data.set_index("Index")
data.index.name = None

I think the advantage of this approach is that it fully meets your demand and achieve the result in place. It produces

   Country Column1 Product     Week    Val
0       UK      S1       A  2019-36   10.0
1       UK      S1       A  2019-37   20.0
2       UK      S1       A  2019-38   30.0
3       UK      S1       B  2019-36   35.0
4       UK      S1       B  2019-37   40.0
5       UK      S1       B  2019-38   45.0
6       DE      S1       A  2019-39  100.0
7       DE      S1       A  2019-40  100.0
8       DE      S1       A  2019-41  100.0
9       DE      S1       B  2019-36   10.0
10      DE      S1       B  2019-37   15.0
11      DE      S1       B  2019-38   10.0

Upvotes: 0

Yuca
Yuca

Reputation: 6091

Using pd.Groupby on 'Country','Column1','Week' conditional on Column2 twice seems to work:

B = df[df['Column2']=='B'].groupby(['Country','Column1','Week']).sum()
A = df[df['Column2']=='A'].groupby(['Country','Column1','Week']).sum() 
0.5*A + B

Output

                          Val
Country Column1 Week         
DE      S1      2019-36   NaN
                2019-37   NaN
                2019-38   NaN
                2019-39   NaN
                2019-40   NaN
                2019-41   NaN
UK      S1      2019-36  35.0
                2019-37  40.0
                2019-38  45.0

This only works if there's unique values per choice of Country, Column1, Week

Upvotes: 1

Related Questions