groupby and stats on lists

Question

I have a dataframe that looks like this:

'Location'    'Dir' 'Set'     'H1'    'H2'
0   Chicago     H1     4    *LIST*  *LIST*
1   Houston     H2     4    *LIST*  *LIST*
2   Los Angeles H2     4    *LIST*  *LIST*
3   Boston      H1     0    *LIST*  *LIST*
4   NYC         H2     0    *LIST*  *LIST*
5   Seattle     H1     0    *LIST*  *LIST*

All list items are NNx1 lists.

What I would like is to obtain the mean (NNx1 again) of each set, being dependent on the 'Dir' value.

For example, for Set 4, I would want he mean of Chicago H1, Houston H2, and Los Angeles H2. In addition, I would like the mean +/- sigma as well.

For example, assuming:

Chicago H1 is [4,8,10]

Houston H2 is [8,4,12]

Los Angeles H2 [6,9,5]

My mean would be [6,7,9]

I thought the .groupby method would be useful but I don't know how to put the conditionality on the 'Dir' column, as well as asking for the average of lists.

Any idea?

cmaher · Accepted Answer

You can get the element-wise mean of your filtered groups in the manner I show below. A few intermediate steps are necessary (reshaping data and converting lists to numpy arrays), but the steps should yield the lists (or arrays) of means that you want.

# melt H1 and H2 columns into key-value columns
# this will make it easier to select either the H1 or H2 list
df = pd.melt(df, id_vars=['Location', 'Set', 'Dir'], \
value_vars=['H1', 'H2'], var_name="Target_Dir", value_name="Values")

# convert lists to numpy arrays
# in order to be able to specify the axis for the mean calculation
df.Values = df.Values.apply(np.array)

# filter df to your target Dirs, group by Set
# and calculate element-wise means
df[df['Dir'] == df['Target_Dir']].groupby('Set')['Values'].apply(lambda x: np.mean(x, axis=0))

groupby and stats on lists

Answers (2)

Related Questions