Souvik Ray
Souvik Ray

Reputation: 3018

How to get data on an hourly basis

I have a csv file whose content is below

2018-02-28 09:48:18.884392+05:30,,
2018-03-04 10:50:34.833787+05:30,,
2018-03-05 13:04:23.634013+05:30,,
2018-03-14 05:30:14.51227+05:30,28.84,27.58
2018-03-14 05:45:14.51227+05:30,12.54,17.47
2018-03-14 06:30:14.466206+05:30,25.1,23.58
2018-03-14 06:40:14.466206+05:30,11.2,14.44
2018-03-14 07:18:14.493826+05:30,21.96,21.54
2018-03-14 08:30:14.593973+05:30,20.48,26.86
2018-03-14 09:30:14.481426+05:30,22.92,15.3
2018-03-14 10:31:20.307558+05:30,7.46,0
2018-03-14 11:30:14.556135+05:30,21,16.5
2018-03-14 12:30:14.569207+05:30,14.14,19.14
2018-03-14 13:11:14.470991+05:30,8.84,6.98
2018-03-14 14:20:14.500747+05:30,8.94,4.5
2018-03-14 15:30:14.487262+05:30,5.92,3.86
2018-03-14 16:30:14.454833+05:30,6.58,10.88
2018-03-14 17:30:14.482084+05:30,7.32,3.36
2018-03-14 18:27:14.559508+05:30,5.52,3.6
2018-03-14 19:30:14.611782+05:30,2.74,3.14
2018-03-14 20:30:14.461808+05:30,4.34,3.2
2018-03-14 21:30:14.533157+05:30,3.8,3.22
2018-03-14 22:15:14.451542+05:30,4.44,3.06
2018-03-14 23:30:14.5494+05:30,3.04,2.92
2018-03-15 00:30:14.477848+05:30,4.68,7.82

Here the first column is the date, the second is the entry for upload speed and the last is the download speed.

I need to display data on an hourly basis for all hours between two specific dates 2018-03-05 and 2018-03-14 such that any number of entries (upload and download speed) made for a paricular hour, I could get the average for those and display the average value for the specific hour.

Here is my code below.

import pandas as pd
import numpy as np


df = pd.read_csv("file.csv", header=None,
                 names=["date", "upload", "download"], parse_dates=["date"])
df.set_index("date", inplace=True)
df.fillna(0, inplace=True)
df.index = df.index.tz_localize('UTC').tz_convert('Asia/Kolkata')
# get data for the specified dates
df2 = df.loc['2018-03-05': '2018-03-14']
# add hourly frequency
print(df2.resample('1H').last())

Below is the format I get

                             upload  download
date                                       
2018-03-05 13:00:00+05:30    0.00      0.00
2018-03-05 14:00:00+05:30     NaN       NaN
2018-03-05 15:00:00+05:30     NaN       NaN
2018-03-05 16:00:00+05:30     NaN       NaN
2018-03-05 17:00:00+05:30     NaN       NaN
2018-03-05 18:00:00+05:30     NaN       NaN
2018-03-05 19:00:00+05:30     NaN       NaN
2018-03-05 20:00:00+05:30     NaN       NaN
2018-03-05 21:00:00+05:30     NaN       NaN
2018-03-05 22:00:00+05:30     NaN       NaN
2018-03-05 23:00:00+05:30     NaN       NaN
2018-03-06 00:00:00+05:30     NaN       NaN
2018-03-06 01:00:00+05:30     NaN       NaN
2018-03-06 02:00:00+05:30     NaN       NaN
2018-03-06 03:00:00+05:30     NaN       NaN
2018-03-06 04:00:00+05:30     NaN       NaN
2018-03-06 05:00:00+05:30     NaN       NaN
2018-03-06 06:00:00+05:30     NaN       NaN
2018-03-06 07:00:00+05:30     NaN       NaN
2018-03-06 08:00:00+05:30     NaN       NaN
2018-03-06 09:00:00+05:30     NaN       NaN
2018-03-06 10:00:00+05:30     NaN       NaN
2018-03-06 11:00:00+05:30     NaN       NaN
2018-03-06 12:00:00+05:30     NaN       NaN
2018-03-06 13:00:00+05:30     NaN       NaN
2018-03-06 14:00:00+05:30     NaN       NaN
2018-03-06 15:00:00+05:30     NaN       NaN
2018-03-06 16:00:00+05:30     NaN       NaN
2018-03-06 17:00:00+05:30     NaN       NaN
2018-03-06 18:00:00+05:30     NaN       NaN
...                           ...       ...
2018-03-13 18:00:00+05:30     NaN       NaN
2018-03-13 19:00:00+05:30     NaN       NaN
2018-03-13 20:00:00+05:30     NaN       NaN
2018-03-13 21:00:00+05:30     NaN       NaN
2018-03-13 22:00:00+05:30     NaN       NaN
2018-03-13 23:00:00+05:30     NaN       NaN
2018-03-14 00:00:00+05:30     NaN       NaN
2018-03-14 01:00:00+05:30     NaN       NaN
2018-03-14 02:00:00+05:30     NaN       NaN
2018-03-14 03:00:00+05:30     NaN       NaN
2018-03-14 04:00:00+05:30     NaN       NaN
2018-03-14 05:00:00+05:30   12.54     17.47
2018-03-14 06:00:00+05:30   11.20     14.44
2018-03-14 07:00:00+05:30   21.96     21.54
2018-03-14 08:00:00+05:30   20.48     26.86
2018-03-14 09:00:00+05:30   22.92     15.30
2018-03-14 10:00:00+05:30    7.46      0.00
2018-03-14 11:00:00+05:30   21.00     16.50
2018-03-14 12:00:00+05:30   14.14     19.14
2018-03-14 13:00:00+05:30    8.84      6.98
2018-03-14 14:00:00+05:30    8.94      4.50
2018-03-14 15:00:00+05:30    5.92      3.86
2018-03-14 16:00:00+05:30    6.58     10.88
2018-03-14 17:00:00+05:30    7.32      3.36
2018-03-14 18:00:00+05:30    5.52      3.60
2018-03-14 19:00:00+05:30    2.74      3.14
2018-03-14 20:00:00+05:30    4.34      3.20
2018-03-14 21:00:00+05:30    3.80      3.22
2018-03-14 22:00:00+05:30    4.44      3.06
2018-03-14 23:00:00+05:30    3.04      2.92

I do get data on hourly basis but it seems wrong.If you observe carefully, for the date 2018-03-14, the original data says at 5:30, my reading was 28.84 and 27.58 respectively and at 5:45, my reading was 12.54 and 17.47.But the formatted data says, at 5:00, the reading was 12.54 and 17.47.It seems its picking the latest entry for the particular hour.The same could be said for other time durations as well.

How do I display data on hourly basis for all hours between the two specified dates containing the average value for the entries made for the particular hour or 0 if no entries were made?

Upvotes: 2

Views: 217

Answers (1)

shivsn
shivsn

Reputation: 7828

IIUC you are using last() which gives last value instead use mean():

df.resample('H').mean().fillna(0)

Upvotes: 1

Related Questions