Spectre
Spectre

Reputation: 23

Python/Pandas: sort by date and compute two week (rolling?) average

So far I've read in 2 CSV's and merged them based on a common element. I take the output of the merged CSV and iterate through the unique element they've been merged on. While I have them separated I want to generate a daily count line and a two week rolling average from the current date going backward. I cannot index based of the 'Date Opened' field but I still need my outputs organized by this with the most recent first. Once these are sorted by date my daily count plotting issue will be rectified. My remaining task would be to compute a two week rolling average for count within the week. I've looked into the Pandas documentation and I think the rolling_mean will work but the parameters of this function don't really make sense to me. I've tried biwk_avg = pd.rolling_mean(open_dt, 28) but that doesnt seem to work. I know there is an easier way to do this but I think I've hit a roadblock with the documentation available. The end result should look something like this graph. Right now my daily count graph isnt sorted(even though I think I've instructed it to) and is unusable in line form.

def data_sort():
    data_merge = data_extract()
    domains  = data_merge.groupby('PWx Domain')
    for domain in domains.groups.items():
        dsort = (data_merge.loc[domain[1]])
        print (dsort.head())
        open_dt = pd.to_datetime(dsort['Date Opened']).dt.date
        #open_dt.to_csv('output\''+str(domain)+'_out.csv', sep = ',')
        open_ct = open_dt.value_counts(sort= False) 
        biwk_avg = pd.rolling_mean(open_ct, 28)
        plt.plot(open_ct,'bo')
        plt.show()

data_sort()

Upvotes: 1

Views: 613

Answers (1)

DYZ
DYZ

Reputation: 57085

Rolling mean alone is not enough in your case; you need a combination of resampling (to group data by days) followed by a 14-day rolling mean (why do you use 28 in your code?). Something like thins:

for _,domain in data_merge.groupby('PWx Domain'):
    # Convert date to the index
    domain.index = pd.to_datetime(domain['Date Opened'])
    # Sort dy dates
    domain.sort_index(inplace=True) 
    # Do the averaging
    rolling = pd.rolling_mean(domain.resample('1D').mean(), 14)
    plt.plot(rolling,'bo')
    plt.show()

Upvotes: 1

Related Questions