Mar
Mar

Reputation: 411

grouping and plotting data by time in python

I have a csv file and I am trying to plot the average of some values per month. My csv file is structured as shown below, so I believe that I should group my data daily, then monthly in order to calculate the mean value.

timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

I am using this code:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

df= pd.read_csv("ave.txt", sep=',', names =["timestamp","heure","lat","lon","impact","type"])
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

But, I keep getting errors like this:

KeyError: 'The grouper name timestamp is not found'

any ideas ??

Upvotes: 4

Views: 5410

Answers (2)

jezrael
jezrael

Reputation: 862431

I believe you need DataFrame.resample.

Also is necessary convert timestamp to DataTimeindex by parameter parse_dates and index_col in read_csv.

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())

#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

graph

Also check if really need count, not size. What is the difference between size and count in pandas?

daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

Upvotes: 1

Andrew L
Andrew L

Reputation: 7038

You're getting this error because you've set the timestamp column to index. Try removing key='timestamp' from TimeGrouper() or the set_index method and it should group as you expect:

daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()

or

daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()

Upvotes: 3

Related Questions