Arlowest
Arlowest

Reputation: 55

Create rolling average in dataframe until a set point

I have a dataframe like this:

month val1 val2 val3
1      2    3    5
2      3    4    7
3      5    1    2
4      7    4    3
5      2    6    4
6      2    2    2

The last month in my initial column is 6 here, but could be anything from month 1 to month 12. I want to calculate a rolling average based on the last 2 values, for each val column until month 12. To get something like this:

month val1 val2 val3
1      2    3    5
2      3    4    7
3      5    1    2
4      7    4    3
5      2    6    4
6      2    2    2
7      2    4    3
8      2    3    2.5
9      2   3.5   2.75
10     2   3.25  2.63
11     2   3.38  2.69
12     2   3.32  2.66

Upvotes: 1

Views: 447

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30981

Define the following function, generating the rows for the rest of the current year, based on last 2 rows:

def getRest(last2):
    last2 = last2.set_index('month')
    lastMonth = last2.index[1]
    rv = []
    for mnth in range(lastMonth, 12):
        newRow = last2.mean()
        newRow.name = mnth + 1
        rv.append(newRow)
        last2 = last2.drop([mnth - 1])
        last2 = last2.append(newRow)
    return rv

Then invoke it the following way, concatenating with the original DataFrame:

pd.concat([df, pd.concat(getRest(df.iloc[-2:]), axis=1).T.reset_index()
    .rename(columns={'index': 'month'})], ignore_index=True)

The result is:

    month  val1    val2     val3
0       1   2.0  3.0000  5.00000
1       2   3.0  4.0000  7.00000
2       3   5.0  1.0000  2.00000
3       4   7.0  4.0000  3.00000
4       5   2.0  6.0000  4.00000
5       6   2.0  2.0000  2.00000
6       7   2.0  4.0000  3.00000
7       8   2.0  3.0000  2.50000
8       9   2.0  3.5000  2.75000
9      10   2.0  3.2500  2.62500
10     11   2.0  3.3750  2.68750
11     12   2.0  3.3125  2.65625

If you want, save this result under either the original variable or another one.

Upvotes: 0

Derek O
Derek O

Reputation: 19565

The main problem is that appending rows to dataframes is a very inefficient process (i.e. creating a new dataframe series each iteration and appending it to the original dataframe will be extremely costly).

Probably the best way to do this is to create an array from the dataframe, do the rolling calculations there, and convert the result into a new dataframe.

import pandas as pd
import numpy as np 

# create dataframe with the first month removed to show the solution is generalizable
df = pd.DataFrame({'month':[2,3,4,5,6],'val1':[3,5,7,2,2],'val2':[4,1,4,6,2],'val3':[7,2,3,4,2]})

df
   month  val1  val2  val3
0      2     3     4     7
1      3     5     1     2
2      4     7     4     3
3      5     2     6     4
4      6     2     2     2

# extract values of the dataframe as numpy and perform rolling operations
# separate out months from other columns
array_values = df.drop(columns = 'month').values

# loop from most recent month to month 12 
for month in range(df.month.iloc[-1],12):
    array_values = np.append(array_values, np.apply_along_axis(np.mean, 0,array_values[-2:]).reshape(1,3), axis = 0)

array_months = np.append(df.month.values, np.arange(df.month.values[-1]+1,13,1))
array_months = array_months.reshape(len(array_months),1)
array_values = np.append(array_months, array_values, axis = 1)

new_df = pd.DataFrame(data = array_values, columns = df.columns)
new_df.month = new_df.month.astype('int')

Output:

new_df
    month  val1    val2     val3
0       2   3.0  4.0000  7.00000
1       3   5.0  1.0000  2.00000
2       4   7.0  4.0000  3.00000
3       5   2.0  6.0000  4.00000
4       6   2.0  2.0000  2.00000
5       7   2.0  4.0000  3.00000
6       8   2.0  3.0000  2.50000
7       9   2.0  3.5000  2.75000
8      10   2.0  3.2500  2.62500
9      11   2.0  3.3750  2.68750
10     12   2.0  3.3125  2.65625

Upvotes: 0

Related Questions