Wookeun Lee
Wookeun Lee

Reputation: 463

Variable shift in a pandas dataframe

import pandas as pd

df = pd.DataFrame({'A':[3,5,3,4,2,3,2,3,4,3,2,2,2,3],
                   'B':[10,20,30,40,20,30,40,10,20,30,15,60,20,15]})

    A   B
0   3  10
1   5  20
2   3  30
3   4  40
4   2  20
5   3  30
6   2  40
7   3  10
8   4  20
9   3  30
10  2  15
11  2  60
12  2  20
13  3  15

I'd like to append a C column, containing rolling average of B (rolling period = A).

For example, the C value at row index(2) should be df.B.rolling(3).mean() = mean(10,20,30), and the C value at row index(4) should be df.B.rolling(2).mean() = mean(40,20).

Upvotes: 3

Views: 265

Answers (1)

piRSquared
piRSquared

Reputation: 294248

probably stupid slow... but this get's it done

def crazy_apply(row):
    p = df.index.get_loc(row.name)
    a = row.A
    return df.B.iloc[p-a+1:p+1].mean()

df.apply(crazy_apply, 1)

0           NaN
1           NaN
2     20.000000
3     25.000000
4     30.000000
5     30.000000
6     35.000000
7     26.666667
8     25.000000
9     20.000000
10    22.500000
11    37.500000
12    40.000000
13    31.666667
dtype: float64

explanation
apply iterates through each column or each row. We iterate through each row because we used the parameter axis=1 (see 1 as the second argument in the call to apply). So every iteration of apply passes the a pandas series object that represents the current row. the current index value is in the name attribute of the row. The index of the row object is the same as the columns of df.

So, df.index.get_loc(row.name) finds the ordinal position of the current index value held in row.name. row.A is the column A for that row.

Upvotes: 2

Related Questions