Reputation: 395
I have a DataFrame ave_data
that contains the following:
ave_data
Time F7 F8 F9
00:00:00 43.005593 -56.509746 25.271271
01:00:00 55.114918 -59.173852 31.849262
02:00:00 63.990762 -64.699492 52.426017
I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.
The ave_data
DataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)
desired output
Time F7 F8 F9 Average
00:00:00 43.005593 -56.509746 25.271271 4.25
01:00:00 55.114918 -59.173852 31.849262 9.26
02:00:00 63.990762 -64.699492 52.426017 17.24
Upvotes: 24
Views: 81965
Reputation: 2333
df.assign
is specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning
. It works as follows:
data_with_avg = avg_data.assign(
average = avg_data.mean(axis=1, numeric_only=True)
)
This function can also create multiple columns at the same time:
data_with_ave = avg_data.assign(
average = avg_data.mean(axis=1, numeric_only=True),
median = avg_data.median(axis=1, numeric_only=True)
)
As of pandas 0.36, you can even reference a newly created column to create another:
data_with_ave = avg_data.assign(
average = avg_data.mean(axis=1, numeric_only=True),
isLarge = lambda df: df['average'] > 10
)
Upvotes: 3
Reputation: 585
In common case if you would like to use specific columns, you can use:
df['average'] = df[['F7','F8']].mean(axis=1)
where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)
Then you may want to sort by this column:
df.sort_values(by='average',ascending=False, inplace=True)
where inplace=True stands for applying action to dataframe instead of calculating on the copy.
Upvotes: 11
Reputation: 367
@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)
so you could simplify this with:
df['average'] = df.mean(numeric_only=True, axis=1)
I would have simply added this as a comment but don't have the reputation
Upvotes: 17
Reputation: 394051
You can take a copy of your df using copy()
and then just call mean
and pass params axis=1
and numeric_only=True
so that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:
In [68]:
summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
Time F7 F8 F9 average
0 2015-07-29 00:00:00 43.005593 -56.509746 25.271271 3.922373
1 2015-07-29 01:00:00 55.114918 -59.173852 31.849262 9.263443
2 2015-07-29 02:00:00 63.990762 -64.699492 52.426017 17.239096
Upvotes: 27