Reputation: 2693
I've got a 'DataFrame` which has occasional missing values, and looks something like this:
Monday Tuesday Wednesday
================================================
Mike 42 NaN 12
Jenna NaN NaN 15
Jon 21 4 1
I'd like to add a new column
to my data frame where I'd calculate the average across all columns
for every row
.
Meaning, for Mike
, I'd need
(df['Monday'] + df['Wednesday'])/2
, but for Jenna
, I'd simply use df['Wednesday amt.']/1
Does anyone know the best way to account for this variation that results from missing values and calculate the average?
Upvotes: 85
Views: 177303
Reputation: 413
Using apply method:
df['avg'] = df[['Monday', 'Tuesday']].apply(np.avg, axis = 1)
Upvotes: -2
Reputation: 320
Resurrecting this Question because all previous answers currently print a Warning.
In most cases, use assign()
:
df = df.assign(avg=df.mean(axis=1))
For specific columns, one can input them by name:
df = df.assign(avg=df.loc[:, ["Monday", "Tuesday", "Wednesday"]].mean(axis=1))
Or by index, using one more than the last desired index as it is not inclusive:
df = df.assign(avg=df.iloc[:,0:3]].mean(axis=1))
Upvotes: 7
Reputation: 2529
Alternative - using iloc (can also use loc here):
df['avg'] = df.iloc[:,0:2].mean(axis=1)
Upvotes: 13
Reputation: 42875
You can simply:
df['avg'] = df.mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 27.000000
Jenna NaN NaN 15 15.000000
Jon 21 4 1 8.666667
because .mean()
ignores missing values by default: see docs.
To select a subset, you can:
df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)
Monday Tuesday Wednesday avg
Mike 42 NaN 12 42.0
Jenna NaN NaN 15 NaN
Jon 21 4 1 12.5
Upvotes: 188