Reputation: 3018
I have a csv file whose content is below
2018-02-28 09:48:18.884392+05:30,,
2018-03-04 10:50:34.833787+05:30,,
2018-03-05 13:04:23.634013+05:30,,
2018-03-14 05:30:14.51227+05:30,28.84,27.58
2018-03-14 05:45:14.51227+05:30,12.54,17.47
2018-03-14 06:30:14.466206+05:30,25.1,23.58
2018-03-14 06:40:14.466206+05:30,11.2,14.44
2018-03-14 07:18:14.493826+05:30,21.96,21.54
2018-03-14 08:30:14.593973+05:30,20.48,26.86
2018-03-14 09:30:14.481426+05:30,22.92,15.3
2018-03-14 10:31:20.307558+05:30,7.46,0
2018-03-14 11:30:14.556135+05:30,21,16.5
2018-03-14 12:30:14.569207+05:30,14.14,19.14
2018-03-14 13:11:14.470991+05:30,8.84,6.98
2018-03-14 14:20:14.500747+05:30,8.94,4.5
2018-03-14 15:30:14.487262+05:30,5.92,3.86
2018-03-14 16:30:14.454833+05:30,6.58,10.88
2018-03-14 17:30:14.482084+05:30,7.32,3.36
2018-03-14 18:27:14.559508+05:30,5.52,3.6
2018-03-14 19:30:14.611782+05:30,2.74,3.14
2018-03-14 20:30:14.461808+05:30,4.34,3.2
2018-03-14 21:30:14.533157+05:30,3.8,3.22
2018-03-14 22:15:14.451542+05:30,4.44,3.06
2018-03-14 23:30:14.5494+05:30,3.04,2.92
2018-03-15 00:30:14.477848+05:30,4.68,7.82
Here the first column is the date, the second is the entry for upload speed and the last is the download speed.
I need to display data on an hourly basis for all hours between two specific dates 2018-03-05
and 2018-03-14
such that any number of entries (upload and download speed) made for a paricular hour, I could get the average for those and display the average value for the specific hour.
Here is my code below.
import pandas as pd
import numpy as np
df = pd.read_csv("file.csv", header=None,
names=["date", "upload", "download"], parse_dates=["date"])
df.set_index("date", inplace=True)
df.fillna(0, inplace=True)
df.index = df.index.tz_localize('UTC').tz_convert('Asia/Kolkata')
# get data for the specified dates
df2 = df.loc['2018-03-05': '2018-03-14']
# add hourly frequency
print(df2.resample('1H').last())
Below is the format I get
upload download
date
2018-03-05 13:00:00+05:30 0.00 0.00
2018-03-05 14:00:00+05:30 NaN NaN
2018-03-05 15:00:00+05:30 NaN NaN
2018-03-05 16:00:00+05:30 NaN NaN
2018-03-05 17:00:00+05:30 NaN NaN
2018-03-05 18:00:00+05:30 NaN NaN
2018-03-05 19:00:00+05:30 NaN NaN
2018-03-05 20:00:00+05:30 NaN NaN
2018-03-05 21:00:00+05:30 NaN NaN
2018-03-05 22:00:00+05:30 NaN NaN
2018-03-05 23:00:00+05:30 NaN NaN
2018-03-06 00:00:00+05:30 NaN NaN
2018-03-06 01:00:00+05:30 NaN NaN
2018-03-06 02:00:00+05:30 NaN NaN
2018-03-06 03:00:00+05:30 NaN NaN
2018-03-06 04:00:00+05:30 NaN NaN
2018-03-06 05:00:00+05:30 NaN NaN
2018-03-06 06:00:00+05:30 NaN NaN
2018-03-06 07:00:00+05:30 NaN NaN
2018-03-06 08:00:00+05:30 NaN NaN
2018-03-06 09:00:00+05:30 NaN NaN
2018-03-06 10:00:00+05:30 NaN NaN
2018-03-06 11:00:00+05:30 NaN NaN
2018-03-06 12:00:00+05:30 NaN NaN
2018-03-06 13:00:00+05:30 NaN NaN
2018-03-06 14:00:00+05:30 NaN NaN
2018-03-06 15:00:00+05:30 NaN NaN
2018-03-06 16:00:00+05:30 NaN NaN
2018-03-06 17:00:00+05:30 NaN NaN
2018-03-06 18:00:00+05:30 NaN NaN
... ... ...
2018-03-13 18:00:00+05:30 NaN NaN
2018-03-13 19:00:00+05:30 NaN NaN
2018-03-13 20:00:00+05:30 NaN NaN
2018-03-13 21:00:00+05:30 NaN NaN
2018-03-13 22:00:00+05:30 NaN NaN
2018-03-13 23:00:00+05:30 NaN NaN
2018-03-14 00:00:00+05:30 NaN NaN
2018-03-14 01:00:00+05:30 NaN NaN
2018-03-14 02:00:00+05:30 NaN NaN
2018-03-14 03:00:00+05:30 NaN NaN
2018-03-14 04:00:00+05:30 NaN NaN
2018-03-14 05:00:00+05:30 12.54 17.47
2018-03-14 06:00:00+05:30 11.20 14.44
2018-03-14 07:00:00+05:30 21.96 21.54
2018-03-14 08:00:00+05:30 20.48 26.86
2018-03-14 09:00:00+05:30 22.92 15.30
2018-03-14 10:00:00+05:30 7.46 0.00
2018-03-14 11:00:00+05:30 21.00 16.50
2018-03-14 12:00:00+05:30 14.14 19.14
2018-03-14 13:00:00+05:30 8.84 6.98
2018-03-14 14:00:00+05:30 8.94 4.50
2018-03-14 15:00:00+05:30 5.92 3.86
2018-03-14 16:00:00+05:30 6.58 10.88
2018-03-14 17:00:00+05:30 7.32 3.36
2018-03-14 18:00:00+05:30 5.52 3.60
2018-03-14 19:00:00+05:30 2.74 3.14
2018-03-14 20:00:00+05:30 4.34 3.20
2018-03-14 21:00:00+05:30 3.80 3.22
2018-03-14 22:00:00+05:30 4.44 3.06
2018-03-14 23:00:00+05:30 3.04 2.92
I do get data on hourly basis but it seems wrong.If you observe carefully, for the date 2018-03-14
, the original data says at 5:30
, my reading was
28.84 and 27.58 respectively and at 5:45
, my reading was 12.54 and 17.47.But the formatted data says, at 5:00
, the reading was 12.54 and 17.47.It seems its picking the latest entry for the particular hour.The same could be said for other time durations as well.
How do I display data on hourly basis for all hours between the two specified dates containing the average value for the entries made for the particular hour or 0 if no entries were made?
Upvotes: 2
Views: 217
Reputation: 7828
IIUC you are using last()
which gives last value instead use mean()
:
df.resample('H').mean().fillna(0)
Upvotes: 1