GKC
GKC

Reputation: 479

Generate daily periodic interval dataframes from a large time series dataframe

I have 31-day time series dataset which I extract using the below query.

import pymysql
import pymysql.cursors
import pandas as pd
import sys
import csv

conn = pymysql.connect(host='',user='',password='',db='',port='',charset='utf8mb4',cursorclass=pymysql.cursors.DictCursor)

dbquery = """SELECT * FROM 'table_name' where 'date' between ('2012-03-01 00:00:00') and ('2012-03-31 23:59:59') """

df = pd.read_sql_query(dbquery, conn)
df.to_csv('one_month_timeseries_data.csv', sep=',', encoding='utf-8')

The extracted dataframe is structured as below.

id date                  value
1  2012-01-01 00:00:00   33
5  2012-01-02 00:00:01   15
.
.
.
.
2  2012-01-30 23:59:58   10
5  2012-01-31 23:59:59   5

I want to further loop through each day (00:00:00 - 23:59:59) through the entire dataset and split the dataframe into 3 separate dataframes of 8 hour intervals each grouped into sleep (00:00:00 - 07:59:59), work (08:00:00 - 15:59:59), home (16:00:00 - 23:59:59).

Expected output:

e.g., for day 1:

day1_df_sleep:

id date                  value
1  2012-01-01 00:00:00   33
.
.
.
.
3  2012-01-01 07:59:59   10

day1_df_work:

id date                   value
1  2012-01-01 08:00:00    12
.
.
.
.
4  2012-01-01 15:59:59    50

day1_df_home:

id date                   value
1  2012-01-01 16:00:00    12
.
.
.
.
4  2012-01-01 23:59:59    50

etc..until day 31

I have read about pandas' date_range() function but I am not sure how best to implement it where I only have the date column.

How best can I implement this using pandas?

Upvotes: 1

Views: 336

Answers (1)

David Erickson
David Erickson

Reputation: 16683

Use dt.day and dt.hour and create new dataframes by filtering by the different days and different times of the day. You cannot use between() as an alternative to .lt and .ge, because the lowerbound is inclusive while the upper bound is exclusive in your desired output. As a separate problem, you can loop through each day, and below is what you would do for each day, and you could set a variable for day when looping through. Looping through each day and creating dataframes for each day is obviously going to be a little bit more complicated.

I don't know the reason for creating separated dataframes by day, but why not just have a function with a varaible for day to return output, which you could call with a parameter for user input, writing it back to sql or excel, etc. without having to create separate dataframes.

df['date'] = pd.to_datetime(df['date'])
day1_df_sleep = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(0) & df['date'].dt.hour.lt(8)]
day1_df_work = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(8) & df['date'].dt.hour.lt(16)]
day1_df_home = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(16) & df['date'].dt.hour.lt(24)]

For example you could do something like this to loop:

df = d.copy()
df['date'] = pd.to_datetime(df['date'])
for i in range(1,31):
    day1_df_sleep = df[df['date'].dt.day == i & df['date'].dt.hour.ge(0) & df['date'].dt.hour.lt(8)]
    day1_df_work = df[df['date'].dt.day == i & df['date'].dt.hour.ge(8) & df['date'].dt.hour.lt(16)]
    day1_df_home = df[df['date'].dt.day == i & df['date'].dt.hour.ge(16) & df['date'].dt.hour.lt(24)]
    print(day1_df_sleep, day1_df_work, day1_df_home)

Upvotes: 2

Related Questions