Reputation: 343
I would like to get the average value of a row in a dataframe where I only use values greater than or equal to zero.
For example: if my dataframe looked like:
df = pd.DataFrame([[3,4,5], [4,5,6],[4,-10,6]])
3 4 5
4 5 6
4 -10 6
currently if I get the average of the row I write :
df['mean'] = df.mean(axis = 1)
and get:
3 4 5 4
4 5 6 5
4 -10 6 0
I would like to get a dataframe that only used values greater than zero to computer the average. I would like a dataframe that looked like:
3 4 5 4
4 5 6 5
4 -10 6 5
In the above example -10 is excluded in the average. Is there a command that excludes the -10?
Upvotes: 5
Views: 5496
Reputation: 294576
Not nearly as succinct as @Psidom. But if you wanted to use numpy
and get some added quickness.
v0 = df.values
v1 = np.where(v0 > 0, v0, np.nan)
v2 = np.nanmean(v1, axis=1)
df.assign(Mean=v2)
0 1 2 Mean
0 3 4 5 4.0
1 4 5 6 5.0
2 4 -10 6 5.0
Timing
small data
%timeit df.assign(Mean=df[df > 0].mean(1))
1000 loops, best of 3: 1.71 ms per loop
%%timeit
v0 = df.values
v1 = np.where(v0 > 0, v0, np.nan)
v2 = np.nanmean(v1, axis=1)
df.assign(Mean=v2)
1000 loops, best of 3: 407 µs per loop
Upvotes: 4
Reputation: 215137
You can use df[df > 0]
to query the data frame before calculating the average; df[df > 0]
returns a data frame where cells smaller or equal to zero will be replaced with NaN
and get ignored when calculating the mean
:
df[df > 0].mean(1)
#0 4.0
#1 5.0
#2 5.0
#dtype: float64
Upvotes: 7