Chs
Chs

Reputation: 129

Sum column by column with if condition in pandas

I'm sorry but I'm bad with words so I'll just give example of what I was looking for.

ID    date1   date2   date3   date4   date5   date6
001   0       5       10      15      5       40
002   0       20      50      0       10      15
003   5       15      5       30      10      0

If I have this dataframe of item prices, is there a way to sum it column by column and apply if condition and update the value in the dataframe? (that for example if sum < 10 then return same amount, if sum >= 10 then return 2x amount and if >= 50 then return 3x amount)

I want to get this dataframe as result:

ID    date1   date2   date3   date4   date5   date6
001   0       5       20      30      10      120
002   0       40      150     0       30      45
003   5       30      10      90      30      0
    

in case of ID 000, date2 stays the same because it's < 10 and but date3 is 10x2 because the total amount is 5+10 which is > 10 therefore returns double the amount, same goes for date4 and date5 but it's x3 for date6.

Upvotes: 3

Views: 317

Answers (4)

G.G
G.G

Reputation: 765

    def function1(ss:pd.Series):
        for i in range(1,len(ss)):
            if (ss[i]+ss[i-1]>=50):
                ss[i]*=3
            elif (ss[i]+ss[i-1]>=10):
                ss[i]*=2
        return ss
    
    df1.apply(function1,axis=1)
    
   date1  date2  date3  date4  date5  date6
0      0      5     20     30     10    120
1      0     40    150      0     20     30
2      5     30     10     60     30      0

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

You can map values in a vectorized manner, casting bool to int:

x = df.iloc[:, 1:].cumsum(axis=1)
df.iloc[:, 1:] = (x.ge(50).astype(int) + x.ge(10).astype(int) + 1).mul(df.iloc[:, 1:])

So: (x.ge(50).astype(int) + x.ge(10).astype(int) + 1)

Maps to 3 for cumulative sum greater than 50, to 2 when it's greater than 10, otherwise it's 1

Outputs:

    ID  date1  date2  date3  date4  date5  date6
0  001      0      5     20     30     10    120
1  002      0     40    150      0     30     45
2  003      5     30     10     90     30      0

Upvotes: 1

Carl F. Corneil
Carl F. Corneil

Reputation: 192

Using applymap

I might have misunderstood the question, this is if you just want to apply a calculation to every cell, not with awareness over the sum of every column.

You can define your own custom function, then apply it to each cell with pandas.DataFrame.applymap(function_name)

import pandas as pd

# DataFrame creation
cols = ['date1', 'date2',  'date3',   'date4',   'date5',   'date6']
index = ['001', '002', '003']
data = [[0,       5,       10,      15,      5,       40],
        [0,       20,      50,      0,       10,      15],
        [5,       15,      5,       30,      10,      0]]
df = pd.DataFrame(data)
df.columns = cols
df.index = index
df.index.name = 'ID'

# Define custom function
def custom_multiply(x):
    if x < 10:
        return x
    elif x < 50:
        return x * 2
    elif x >= 50:
        return x * 3

# Apply function to all cells and reassign to df-object
df = df.applymap(custom_multiply)
df

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can filter the date like columns then take the cumsum on these columns along axis=1, finally use np.select with specified condition's and corresponding choices to get the final result:

s = df.filter(like='date')
cs = s.cumsum(1) # cummulative sum along axis=1
df[s.columns] = np.select([cs.ge(50), cs.ge(10)], [s*3, s*2], s)

    ID  date1  date2  date3  date4  date5  date6
0  001      0      5     20     30     10    120
1  002      0     40    150      0     30     45
2  003      5     30     10     90     30      0

Upvotes: 7

Related Questions