Apply multiple custom functions to multiple columns on multiple groupby objects in Pandas in Python

Question

I have the following dataframe, groupby objects, and functions.

df = pd.DataFrame({
    'A': 'a a b b b'.split(), 
    'P': 'p p p q q'.split(), 
    'B': [1, 2, 3, 4, 5], 
    'C': [4, 6, 5, 7, 8],
    'D': [9, 10, 11, 12, 13]})

g1 = df.groupby('A')

g2 = df.groupby('P')

def f1(x, y):
    return sum(x) + sum(y)

def f2(x, y):
    return sum(x) - sum(y)

def f3(x, y):
    return x * y

For g1, I want to

apply f1 to columns B and C
apply f2 to columns C and D.

For g2, I want to

apply f2 to columns B and C
apply f3 to columns C and D

To me, the difficulty lies in the functions, which operate on multiple columns. I also need the functions to work for any arbitrary set of columns; notice how f2 is used for ['B', 'C'] and ['C', 'D']. I'm struggling with the syntax to deal with this.

How do I use Pandas to do all of these things in Python?

Acccumulation · Accepted Answer

I don't know if there's a simpler way to do it, but one way is to use currying. I wasn't able to find a way to use the groupby structure to add a column (the structures involved are designed around non-mutable data), so I just dealt with the data in the groupby object directly. You can see whether the following code does what you want:

def sum_curry(x, y):
    return lambda df: sum(df[x]) + sum(df[y])

def diff_curry(x, y):
    return lambda df: sum(df[x]) - sum(df[y])

def append_prod(df):
    df['E'] = df['C']*df['D']
    return df
   
g1_sums = g1.apply(sum_curry('B','C'))
g1_diffs = g1.apply(diff_curry('C','D'))
g2_diffs = g2.apply(diff_curry('B','C'))
g2_with_prod = [(group[0], append_prod(group[1])) for group in g2]

Apply multiple custom functions to multiple columns on multiple groupby objects in Pandas in Python

Answers (1)

Related Questions