LinnK
LinnK

Reputation: 395

Add column to the end of Pandas DataFrame containing average of previous data

I have a DataFrame ave_data that contains the following:

ave_data

Time        F7           F8            F9  
00:00:00    43.005593    -56.509746    25.271271  
01:00:00    55.114918    -59.173852    31.849262  
02:00:00    63.990762    -64.699492    52.426017

I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.

The ave_data DataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)

desired output

Time        F7           F8            F9           Average
00:00:00    43.005593    -56.509746    25.271271    4.25  
01:00:00    55.114918    -59.173852    31.849262    9.26
02:00:00    63.990762    -64.699492    52.426017    17.24

Upvotes: 24

Views: 81965

Answers (4)

johnDanger
johnDanger

Reputation: 2333

df.assign is specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning. It works as follows:

data_with_avg = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True)
)

This function can also create multiple columns at the same time:

data_with_ave = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True),
    median = avg_data.median(axis=1, numeric_only=True)
)

As of pandas 0.36, you can even reference a newly created column to create another:

data_with_ave = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True),
    isLarge = lambda df: df['average'] > 10
)

Upvotes: 3

Sergey  Zaitsev
Sergey Zaitsev

Reputation: 585

In common case if you would like to use specific columns, you can use:

df['average'] = df[['F7','F8']].mean(axis=1)

where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)

Then you may want to sort by this column:

df.sort_values(by='average',ascending=False, inplace=True)

where inplace=True stands for applying action to dataframe instead of calculating on the copy.

Upvotes: 11

kt-0
kt-0

Reputation: 367

@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)

so you could simplify this with:

df['average'] = df.mean(numeric_only=True, axis=1)

I would have simply added this as a comment but don't have the reputation

Upvotes: 17

EdChum
EdChum

Reputation: 394051

You can take a copy of your df using copy() and then just call mean and pass params axis=1 and numeric_only=True so that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:

In [68]:

summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
                 Time         F7         F8         F9    average
0 2015-07-29 00:00:00  43.005593 -56.509746  25.271271   3.922373
1 2015-07-29 01:00:00  55.114918 -59.173852  31.849262   9.263443
2 2015-07-29 02:00:00  63.990762 -64.699492  52.426017  17.239096

Upvotes: 27

Related Questions