bikuser
bikuser

Reputation: 2093

how to count continuous monthly data in series?

I have long time series daily data from 1981 to 1991. I have successfully counted the long time monthly zero values in the series by using the code below:

when I tried to count long non zero value in the series by changing the != to == it doesnot work for monthly grouping but it works good for yearly grouping. can anyone help me on this matter?

def func(group):
    return (group.prec != 0).astype(int).cumsum().value_counts().values[0] - 1

df.groupby(['year', 'month']).apply(func)

(credit: Jianxun Li)

Upvotes: 0

Views: 102

Answers (1)

Jianxun Li
Jianxun Li

Reputation: 24742

Uhh, I see the problem here. .value_counts() returns something like below for compare-cumsum-pattern

3    8
0    5
9    3
6    2
8    1
7    1
4    1
1    1
dtype: int64

and .values[0] causes confusions because of the integer index. To fix it, use .iloc[0] to access the first element.

import pandas as pd
import numpy as np

# simulate some artificial data
# ============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4000), columns=['prec'], index=pd.date_range('1981-01-01', periods=4000, freq='D'))
df['prec'] = np.where(df['prec'] > 0, df['prec'], 0.0)
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day

df
              prec  year  month  day
1981-01-01  1.7641  1981      1    1
1981-01-02  0.4002  1981      1    2
1981-01-03  0.9787  1981      1    3
1981-01-04  2.2409  1981      1    4
1981-01-05  1.8676  1981      1    5
1981-01-06  0.0000  1981      1    6
1981-01-07  0.9501  1981      1    7
1981-01-08  0.0000  1981      1    8
...            ...   ...    ...  ...
1991-12-07  0.0653  1991     12    7
1991-12-08  0.0000  1991     12    8
1991-12-09  0.3949  1991     12    9
1991-12-10  0.0000  1991     12   10
1991-12-11  1.7796  1991     12   11
1991-12-12  0.0000  1991     12   12
1991-12-13  1.5771  1991     12   13
1991-12-14  0.0000  1991     12   14

[4000 rows x 4 columns]



# processing
# ===============================
def func(group):
    return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]

df.groupby(['year', 'month']).apply(func)

year  month
1981  1         8
      2         3
      3         4
      4        10
      5         5
               ..
1991  8         3
      9         5
      10        3
      11        6
      12        2
dtype: int64


# double check on a particular group
# ======================================================
group = df.groupby(['year', 'month']).get_group((1981,1))
group

              prec  year  month  day
1981-01-01  1.7641  1981      1    1
1981-01-02  0.4002  1981      1    2
1981-01-03  0.9787  1981      1    3
1981-01-04  2.2409  1981      1    4
1981-01-05  1.8676  1981      1    5
1981-01-06  0.0000  1981      1    6
1981-01-07  0.9501  1981      1    7
1981-01-08  0.0000  1981      1    8
1981-01-09  0.0000  1981      1    9
1981-01-10  0.4106  1981      1   10
1981-01-11  0.1440  1981      1   11
1981-01-12  1.4543  1981      1   12
1981-01-13  0.7610  1981      1   13
1981-01-14  0.1217  1981      1   14
1981-01-15  0.4439  1981      1   15
1981-01-16  0.3337  1981      1   16
1981-01-17  1.4941  1981      1   17
1981-01-18  0.0000  1981      1   18
1981-01-19  0.3131  1981      1   19
1981-01-20  0.0000  1981      1   20
1981-01-21  0.0000  1981      1   21
1981-01-22  0.6536  1981      1   22
1981-01-23  0.8644  1981      1   23
1981-01-24  0.0000  1981      1   24
1981-01-25  2.2698  1981      1   25
1981-01-26  0.0000  1981      1   26
1981-01-27  0.0458  1981      1   27
1981-01-28  0.0000  1981      1   28
1981-01-29  1.5328  1981      1   29
1981-01-30  1.4694  1981      1   30
1981-01-31  0.1549  1981      1   31

(group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]

# output: 8

Edit:

You need to modify the apply func as below to count consecutive non-zero values.

def func(group):
    return (group.prec == 0).astype(int).cumsum()[group.prec != 0].value_counts().iloc[0]

Upvotes: 1

Related Questions