Selecting and averaging values in pandas data frames

Question

I am new to the pandas data analysis library. The task I need to get done is the following:

Given a data frame (table) with a set of columns C, select all the rows for which the values in a subset of columns A are identical and combine them to a single row in which the values for the other rows C \ A are averaged.

(The goal is to test multiple runs of an program and average the values like running time in those cases where the parameters are identical.)

For example:

a       b       c       d
eggs    ham     1.0     0.0
eggs    ham     0.0     0.5
spam    eggs    0.0     1.0
spam    eggs    0.0     1.0

should be combined to:

a       b       c       d
eggs    ham     0.5     0.25
spam    eggs    0.0     1.0

when selecting the columns a and b (which means that c and d should be averaged).

What are my options?

HYRY · Accepted Answer

This is what groupby does

http://pandas.pydata.org/pandas-docs/stable/groupby.html

from StringIO import StringIO

txt = """a       b       c       d
eggs    ham     1.0     0.0
eggs    ham     0.0     0.5
spam    eggs    0.0     1.0
spam    eggs    0.0     1.0"""

df = pd.read_csv(StringIO(txt), sep="\s+")
df.groupby(["a","b"], as_index=False).mean()

Selecting and averaging values in pandas data frames

Answers (1)

Related Questions