Reputation: 287
I have the following dataframe, groupby objects, and functions.
df = pd.DataFrame({
'A': 'a a b b b'.split(),
'P': 'p p p q q'.split(),
'B': [1, 2, 3, 4, 5],
'C': [4, 6, 5, 7, 8],
'D': [9, 10, 11, 12, 13]})
g1 = df.groupby('A')
g2 = df.groupby('P')
def f1(x, y):
return sum(x) + sum(y)
def f2(x, y):
return sum(x) - sum(y)
def f3(x, y):
return x * y
For g1, I want to
For g2, I want to
To me, the difficulty lies in the functions, which operate on multiple columns. I also need the functions to work for any arbitrary set of columns; notice how f2 is used for ['B', 'C'] and ['C', 'D']. I'm struggling with the syntax to deal with this.
How do I use Pandas to do all of these things in Python?
Upvotes: 0
Views: 660
Reputation: 3591
I don't know if there's a simpler way to do it, but one way is to use currying. I wasn't able to find a way to use the groupby structure to add a column (the structures involved are designed around non-mutable data), so I just dealt with the data in the groupby object directly. You can see whether the following code does what you want:
def sum_curry(x, y):
return lambda df: sum(df[x]) + sum(df[y])
def diff_curry(x, y):
return lambda df: sum(df[x]) - sum(df[y])
def append_prod(df):
df['E'] = df['C']*df['D']
return df
g1_sums = g1.apply(sum_curry('B','C'))
g1_diffs = g1.apply(diff_curry('C','D'))
g2_diffs = g2.apply(diff_curry('B','C'))
g2_with_prod = [(group[0], append_prod(group[1])) for group in g2]
Upvotes: 1