Reputation: 755
I have the following dataframe:
df = pd.DataFrame({'a': [5003, 54.06, 53.654, 55.2], 'b': [np.nan, 54.1121, 53.98, 55.12], 'c': [np.nan, 2, 53.322, 54.99],
'd': [np.nan, 53.1, 53.212, 55.002], 'e': [np.nan, 53, 53.2, 55.021], 'f': [np.nan, 53.11, 53.120, 55.3]})
I want to get the mean of each column on a rolling basis (let's say rolling(1).mean()) and then get the 95% confidence interval for each entry CI = x +- z*s/sqrt(n), where x is the rolling average, z is confidence level value, s is the rolling standard deviation (let's say rolling(1).stdev), and n is the number of entries in the column.
Can this be done pythonically without loops?
Thank you.
Upvotes: 0
Views: 1302
Reputation: 11
I believe this should do the trick, albeit 2 months late ;-)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
N = 3 # window size
Z = 1.960 # 95% confidence interval
df = pd.DataFrame({
'a': [55.2, 54.06, 53.654, 55.2],
'b': [np.nan, 54.1121, 53.98, 55.12],
'c': [np.nan, 50, 53.322, 54.99],
'd': [np.nan, 53.1, 53.212, 55.002],
'e': [np.nan, 53, 53.2, 55.021],
'f': [np.nan, 53.11, 53.120, 55.3]
})
movMean = df.rolling(window=N,center=True,min_periods=1).mean()
movStd = df.rolling(window=N,center=True,min_periods=1).std()
# get mean +/- 1.96*std/sqrt(N)
confIntPos = movMean + Z * movStd / np.sqrt(N)
confIntNeg = movMean - Z * movStd / np.sqrt(N)
# plot a as example column
plt.plot(movMean['a'])
plt.plot(confIntPos['a'], color = 'red', ls = '--')
plt.plot(confIntNeg['a'], color = 'red', ls = '--')
Upvotes: 1