Reputation: 51
I am working on the Metro Interstate Traffic Volume data set (available here: http://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume) but I can't resample the dataset to show the average traffic volume per day instead of showing it per hour.
metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
metro.set_index('date_time', inplace=True, drop=True)
metro.resample('1Y').mean()
This is what I get:
holiday temp ... weather_description traffic_volume
date_time ...
2012-10-02 09:00:00 None 288.28 ... scattered clouds 5545
2012-10-02 10:00:00 None 289.36 ... broken clouds 4516
2012-10-02 11:00:00 None 289.58 ... overcast clouds 4767
2012-10-02 12:00:00 None 290.13 ... overcast clouds 5026
2012-10-02 13:00:00 None 291.14 ... broken clouds 4918
... ... ... ... ... ...
2018-09-30 19:00:00 None 283.45 ... broken clouds 3543
2018-09-30 20:00:00 None 282.76 ... overcast clouds 2781
2018-09-30 21:00:00 None 282.73 ... proximity thunderstorm 2159
2018-09-30 22:00:00 None 282.09 ... overcast clouds 1450
2018-09-30 23:00:00 None 282.12 ... overcast clouds 954
[48204 rows x 8 columns]
Do you have any idea on how to solve it?
Edit:
Also, I checked the pandas reference for resampling (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) and executed the following example code:
d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018',
periods=8,
freq='W')
df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
However, the result for me is the same before and after the resampling.
Upvotes: 2
Views: 186
Reputation: 2819
You can try by creating a year column:
metro['year'] =metro['date_time'].dt.year
metro.groupby['year'].mean()
Upvotes: 1
Reputation: 150735
One year is not a fix time period: some years have 365 days, some have 366. You can use groupby
:
metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
# extract the yer
metro.groupby(metro['date_time'].dt.year).mean()
Output:
temp rain_1h snow_1h clouds_all traffic_volume
date_time
2012 274.991782 0.000000 0.000000 65.295819 3207.802657
2013 278.976352 0.161284 0.000000 52.560947 3286.762160
2014 276.786438 0.243251 0.000000 49.070469 3250.938004
2015 287.689574 0.339218 0.001795 40.988338 3242.900983
2016 282.520790 1.192969 0.000308 48.628842 3169.441328
2017 281.463309 0.000000 0.000000 50.005281 3340.703065
2018 282.851502 0.121765 0.000000 45.567996 3260.112341
Upvotes: 2