OopsUser
OopsUser

Reputation: 4784

Rolling mean and standard deviation without zeros

I have a data frame that one of its columns represents how many corns produced in this time stamp. for example

timestamp corns_produced another_column
1           5                  4
2           0                  1
3           0                  3
4           3                  4
 

The dataframe is big.. 100,000+ rows

I want to calculate moving average and std for 1000 time stamps of corn_produced. Luckily it is pretty easy using rolling :

But the problem is I want to ignore the zeros, meaning if in the last 1000 timestamps there are only 5 instances in which corn was produced, I want to do the mean and std on those 5 elements.

How do I ignore the zeros ?

Just to clarify, I don't want to do the following x = my_df[my_df['corns_produced'] != 0], and than do rolling on x, because it ignores the time stamps and doesn't give me the result I need

Upvotes: 0

Views: 1741

Answers (2)

Mr. AlmostRight
Mr. AlmostRight

Reputation: 1

A faster solution: first set all zeros to np.nan, then take a rolling mean. If you are dealing with large data, it will be much faster

Upvotes: 0

jezrael
jezrael

Reputation: 862691

You can use Rolling.apply:

print (my_df.rolling(1000).apply(lambda x: x[x!= 0].mean()))
print (my_df.rolling(1000).apply(lambda x: x[x!= 0].std()))

Upvotes: 2

Related Questions