Reputation: 385
I have a (493,20) pandas dataframe and want to compute a conditional np.nanmean() for each row. The condition is that each value in the row needs to be above a certain threshold and below another. Here's my current setup:
filt_avg_data= np.nanmean(data_tsl.apply(func= lambda x: x[(x < maxval*np.median(x)) & (x > minval*np.median(x))], axis= 1),axis=1)
where maxval: 10, minval: 0.1, and data_tsl.shape= (493,20). This works okay.
However, I want to vectorize this operation - I don't want to use the df.apply() function. I tried
data_tsl>np.median(data_tsl,axis=1)
to create a mask of values on which I can perform a np.nanmean() operation on, but it seems as though I can't get each row of data_tsl
to correspond to its respective median value. Here is the error that pops up: ValueError: operands could not be broadcast together with shapes (493,2) (493,)
How might I be able to vectorize this operation? Several questions that were similar to this weren't actually asking to vectorize the problem - rather, simply to get the .apply() operation to work.
Upvotes: 0
Views: 142
Reputation: 221634
If you have NaNs
in the input data, I would think you want to use np.nanmedian
to ignore NaNs
from the median calculation. Going with it, we can use the combined mask for the upper and lower thresholds to set the invalid ones to NaNs
as well and finally use np.nanmean
-
a = data_tsl.values # use data_tsl.values.copy() to avoid editing input df
med = np.nanmedian(a,axis=1)
U = maxval*med
L = minval*med
a[(a >= U[:,None]) | (a <= L[:,None])] = np.nan
out = np.nanmean(a,axis=1)
Upvotes: 2