Reputation: 22438
I am new to the pandas data analysis library. The task I need to get done is the following:
Given a data frame (table) with a set of columns C
, select all the rows for which the values in a subset of columns A
are identical and combine them to a single row in which the values for the other rows C \ A
are averaged.
(The goal is to test multiple runs of an program and average the values like running time in those cases where the parameters are identical.)
For example:
a b c d
eggs ham 1.0 0.0
eggs ham 0.0 0.5
spam eggs 0.0 1.0
spam eggs 0.0 1.0
should be combined to:
a b c d
eggs ham 0.5 0.25
spam eggs 0.0 1.0
when selecting the columns a
and b
(which means that c
and d
should be averaged).
What are my options?
Upvotes: 0
Views: 435
Reputation: 97291
This is what groupby does
http://pandas.pydata.org/pandas-docs/stable/groupby.html
from StringIO import StringIO
txt = """a b c d
eggs ham 1.0 0.0
eggs ham 0.0 0.5
spam eggs 0.0 1.0
spam eggs 0.0 1.0"""
df = pd.read_csv(StringIO(txt), sep="\\s+")
df.groupby(["a","b"], as_index=False).mean()
Upvotes: 6