Reputation: 479
I have 31-day time series dataset which I extract using the below query.
import pymysql
import pymysql.cursors
import pandas as pd
import sys
import csv
conn = pymysql.connect(host='',user='',password='',db='',port='',charset='utf8mb4',cursorclass=pymysql.cursors.DictCursor)
dbquery = """SELECT * FROM 'table_name' where 'date' between ('2012-03-01 00:00:00') and ('2012-03-31 23:59:59') """
df = pd.read_sql_query(dbquery, conn)
df.to_csv('one_month_timeseries_data.csv', sep=',', encoding='utf-8')
The extracted dataframe is structured as below.
id date value
1 2012-01-01 00:00:00 33
5 2012-01-02 00:00:01 15
.
.
.
.
2 2012-01-30 23:59:58 10
5 2012-01-31 23:59:59 5
I want to further loop through each day (00:00:00 - 23:59:59) through the entire dataset and split the dataframe into 3 separate dataframes of 8 hour intervals each grouped into sleep (00:00:00 - 07:59:59), work (08:00:00 - 15:59:59), home (16:00:00 - 23:59:59).
Expected output:
e.g., for day 1:
day1_df_sleep:
id date value
1 2012-01-01 00:00:00 33
.
.
.
.
3 2012-01-01 07:59:59 10
day1_df_work:
id date value
1 2012-01-01 08:00:00 12
.
.
.
.
4 2012-01-01 15:59:59 50
day1_df_home:
id date value
1 2012-01-01 16:00:00 12
.
.
.
.
4 2012-01-01 23:59:59 50
etc..until day 31
I have read about pandas' date_range() function but I am not sure how best to implement it where I only have the date column.
How best can I implement this using pandas?
Upvotes: 1
Views: 336
Reputation: 16683
Use dt.day
and dt.hour
and create new dataframes by filtering by the different days and different times of the day. You cannot use between()
as an alternative to .lt
and .ge
, because the lowerbound is inclusive while the upper bound is exclusive in your desired output. As a separate problem, you can loop through each day, and below is what you would do for each day, and you could set a variable for day when looping through. Looping through each day and creating dataframes for each day is obviously going to be a little bit more complicated.
I don't know the reason for creating separated dataframes by day, but why not just have a function with a varaible for day to return output, which you could call with a parameter for user input, writing it back to sql or excel, etc. without having to create separate dataframes.
df['date'] = pd.to_datetime(df['date'])
day1_df_sleep = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(0) & df['date'].dt.hour.lt(8)]
day1_df_work = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(8) & df['date'].dt.hour.lt(16)]
day1_df_home = df[df['date'].dt.day == 1 & df['date'].dt.hour.ge(16) & df['date'].dt.hour.lt(24)]
For example you could do something like this to loop:
df = d.copy()
df['date'] = pd.to_datetime(df['date'])
for i in range(1,31):
day1_df_sleep = df[df['date'].dt.day == i & df['date'].dt.hour.ge(0) & df['date'].dt.hour.lt(8)]
day1_df_work = df[df['date'].dt.day == i & df['date'].dt.hour.ge(8) & df['date'].dt.hour.lt(16)]
day1_df_home = df[df['date'].dt.day == i & df['date'].dt.hour.ge(16) & df['date'].dt.hour.lt(24)]
print(day1_df_sleep, day1_df_work, day1_df_home)
Upvotes: 2