Reputation: 4784
I have a data frame that one of its columns represents how many corns produced in this time stamp. for example
timestamp corns_produced another_column
1 5 4
2 0 1
3 0 3
4 3 4
The dataframe is big.. 100,000+ rows
I want to calculate moving average
and std
for 1000 time stamps of corn_produced
.
Luckily it is pretty easy using rolling
:
my_df.rolling(1000).mean()
my_df.rolling(1000).std()
.But the problem is I want to ignore the zeros, meaning if in the last 1000 timestamps there are only 5 instances in which corn was produced, I want to do the mean
and std
on those 5 elements.
How do I ignore the zeros ?
Just to clarify, I don't want to do the following x = my_df[my_df['corns_produced'] != 0]
, and than do rolling
on x
, because it ignores the time stamps and doesn't give me the result I need
Upvotes: 0
Views: 1741
Reputation: 1
A faster solution: first set all zeros to np.nan
, then take a rolling mean. If you are dealing with large data, it will be much faster
Upvotes: 0
Reputation: 862691
You can use Rolling.apply
:
print (my_df.rolling(1000).apply(lambda x: x[x!= 0].mean()))
print (my_df.rolling(1000).apply(lambda x: x[x!= 0].std()))
Upvotes: 2