Ankita
Ankita

Reputation: 485

Data grouping into weekyly, monthly and yearly for large datasets using python?

I have datasets that record 'X' value for the 20 years in Dataframe format. X records the data with 3 hrs average and sample of data are given below.

   Time_stamp             X
 1992-01-01 03:00:00    10.2
 1992-01-01 06:00:00    10.4
 1992-01-01 09:00:00    11.8
 1992-01-01 12:00:00    12.0
 1992-01-01 15:00:00    10.4
 1992-01-01 18:00:00    9.4
 1992-01-01 21:00:00    10.4
 1992-01-02 00:00:00    13.6
 1992-01-02 03:00:00    13.2
 1992-01-02 06:00:00    11.8
 1992-01-02 09:00:00    12.0
 1992-01-02 12:00:00    12.8
 1992-01-02 15:00:00    12.6
 1992-01-02 18:00:00    11.0
 1992-01-02 21:00:00    12.2
 1992-01-03 00:00:00    13.8
 1992-01-03 03:00:00    14.0
 1992-01-03 06:00:00    13.4
 1992-01-03 09:00:00    14.2
 1992-01-03 12:00:00    16.2
 1992-01-03 15:00:00    13.2
 1992-01-03 18:00:00    13.4
 1992-01-03 21:00:00    13.8
 1992-01-04 00:00:00    14.8
 1992-01-04 03:00:00    13.8
 1992-01-04 06:00:00    7.6
 1992-01-04 09:00:00    5.8
 1992-01-04 12:00:00    4.4
 1992-01-04 15:00:00    5.6
 1992-01-04 18:00:00    6.0
 1992-01-04 21:00:00    7.0
 1992-01-05 00:00:00    6.8
 1992-01-05 03:00:00    3.4
 1992-01-05 06:00:00    5.8
 1992-01-05 09:00:00    10.6
 1992-01-05 12:00:00    9.2
 1992-01-05 15:00:00    10.6
 1992-01-05 18:00:00    9.8
 1992-01-05 21:00:00    11.2
 1992-01-06 00:00:00    12.0
 1992-01-06 03:00:00    10.2
 1992-01-06 06:00:00    9.0
 1992-01-06 09:00:00    9.0
 1992-01-06 12:00:00    8.6
 1992-01-06 15:00:00    8.4
 1992-01-06 18:00:00    8.2
 1992-01-06 21:00:00    8.8
 1992-01-07 00:00:00    10.0
 1992-01-07 03:00:00    9.6
 1992-01-07 06:00:00    8.0
 1992-01-07 09:00:00    9.6
 1992-01-07 12:00:00    10.8
 1992-01-07 15:00:00    10.2
 1992-01-07 18:00:00    9.8
 1992-01-07 21:00:00    10.2
 1992-01-08 00:00:00    9.4
 1992-01-08 03:00:00    11.4
 1992-01-08 06:00:00    12.6
 1992-01-08 09:00:00    12.8
 1992-01-08 12:00:00    10.4
 1992-01-08 15:00:00    11.2
 1992-01-08 18:00:00    9.0
 1992-01-08 21:00:00    10.2
 1992-01-09 00:00:00    8.2

I would like to create seperate dataframe that calcute and records yearly mean, weekly mean and daily mean of the given datasets. I am new to python and just started working with time series data. I found some question related to this here at stackoverflow but did not find appropriate answer related to this aand did not find any idea how to start with. Any help on this ? I wrote this code so far,

import pandas as pd
import numpy as np

datasets['date_minus_time'] = df["Time_stamp"].apply( lambda df : 
datetime.datetime(year=datasets.year, month=datasets.month, 
day=datasets.day))  
datasets.set_index(df["date_minus_time"],inplace=True)

df['count'].resample('D', how='sum')
df['count'].resample('W', how='sum')
df['count'].resample('M', how='sum')

But not getting a clue how to include that data records every 3 hrs. and what should to next for my desired result.

Upvotes: 0

Views: 52

Answers (2)

jezrael
jezrael

Reputation: 862591

Use to_datetime for datetimes in column for improve performance and then DataFrame.resample with parameter on for specify datetime column:

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])

df_daily = df.resample('D', on='Time_stamp').mean()
df_monthly = df.resample('M', on='Time_stamp').mean()
df_weekly = df.resample('W', on='Time_stamp').mean()

Upvotes: 1

luigigi
luigigi

Reputation: 4215

You may use:

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'], format='%Y-%m-%d %H:%M:%S')
df.set_index('Time_stamp',inplace=True)
df_monthly = df.resample('M').mean()

df_monthly outputs:

                    X
Time_stamp           
1992-01-31  10.403125

For daily mean use: df_daily = df.resample('D').mean() which outputs:

                    X
Time_stamp           
1992-01-01  10.657143
1992-01-02  12.400000
1992-01-03  14.000000
1992-01-04   8.125000
1992-01-05   8.425000
1992-01-06   9.275000
1992-01-07   9.775000
1992-01-08  10.875000
1992-01-09   8.200000

Upvotes: 0

Related Questions