Jiang Xu
Jiang Xu

Reputation: 101

How to do rolling calculation based a a subsets of rows

I am doing rolling mean analysis on a pandas dataframe,

R1_Chr_Name distance    count
0   chr1    100 163
1   chr1    101 203
2   chr1    102 194
3   chr1    103 193
4   chr1    104 154
5   chr2    100 150
6   chr2    101 152
7   chr2    102 163
8   chr2    103 161
9   chr2    104 170
10  chr3    100 154
11  chr3    101 160
12  chr3    102 175
13  chr3    103 134
14  chr3    104 151

I want to add a column named 'average_count' that will do rolling mean within a subgroup of rows, e.g. do rolling mean of 'count' when column 'R1_Chr_Name' equals to chr1, chr2, ch3 ..., The disired result should look like:

R1_Chr_Name distance    count   average_count
0   chr1    100 163 NaN
1   chr1    101 203 NaN
2   chr1    102 194 186.666667
3   chr1    103 193 196.666667
4   chr1    104 154 180.333333
5   chr2    100 150 NaN
6   chr2    101 152 NaN
7   chr2    102 163 155.000000
8   chr2    103 161 158.666667
9   chr2    104 170 164.666667
10  chr3    100 154 NaN
11  chr3    101 160 NaN
12  chr3    102 175 163.000000
13  chr3    103 134 156.333333
14  chr3    104 151 153.333333

Currently I am using the following code and found previous calculations were overwritten:

chr_ls = ['chr1', 'chr2', 'chr3']
for chrom in chr_ls:
    df['average_count']=df[df['R1_Chr_Name']==chrom]['count'].rolling(3).mean()
    print(df)

R1_Chr_Name  distance  count  average_count
0         chr1       100    163            NaN
1         chr1       101    203            NaN
2         chr1       102    194            NaN
3         chr1       103    193            NaN
4         chr1       104    154            NaN
5         chr2       100    150            NaN
6         chr2       101    152            NaN
7         chr2       102    163            NaN
8         chr2       103    161            NaN
9         chr2       104    170            NaN
10        chr3       100    154            NaN
11        chr3       101    160            NaN
12        chr3       102    175     163.000000
13        chr3       103    134     156.333333
14        chr3       104    151     153.333333

So, how to do it correctly?

Upvotes: 0

Views: 36

Answers (1)

BENY
BENY

Reputation: 323366

Check with groupby

df['ave_count']=df.groupby('R1_Chr_Name')['count'].rolling(3).mean().reset_index(level=0,drop=True)
df
Out[232]: 
   R1_Chr_Name  distance  count   ave_count
0         chr1       100    163         NaN
1         chr1       101    203         NaN
2         chr1       102    194  186.666667
3         chr1       103    193  196.666667
4         chr1       104    154  180.333333
5         chr2       100    150         NaN
6         chr2       101    152         NaN
7         chr2       102    163  155.000000
8         chr2       103    161  158.666667
9         chr2       104    170  164.666667
10        chr3       100    154         NaN
11        chr3       101    160         NaN
12        chr3       102    175  163.000000
13        chr3       103    134  156.333333
14        chr3       104    151  153.333333

Upvotes: 2

Related Questions