Reputation: 115
I am new to pandas and I am trying to plot the number of events in function of time, at different time resolutions.
My data file looks like this:
223789 213163 1341100972
223789 213163 1341100972
376989 50329 1341101181
26375 168366 1341101183
376989 13813 1341101192
...
The third column is the timestamp. I want to read the file and plot the number of rows per timestamp. This is what I do:
data = read_table(file_name, sep=' ', header=None, names=['u1','u2','timestamp'], dtype={'timestamp': np.int}, parse_dates=[2], date_parser=datetime.datetime.fromtimestamp)
data.groupby('timestamp').size().plot()
This works if I am fine with a resolution of seconds, but I don't understand what is the best way to aggregate the data to obtain a resolution of minutes or hours. In fact if I do:
data.groupby(data['timestamp'].map(lambda t: t.hour)).size().plot()
the problem is that all the rows referring to the same hour in different days are aggregated, whereas I would like to keep the time ordering.
I haven't found a solution browsing related posts and Stack Overflow questions. Could anyone help, please?
Thanks!
Upvotes: 0
Views: 875
Reputation: 77027
Using TimeGrouper method, you could do this
data.set_index('timestamp').groupby(pd.TimeGrouper('1D')).count()
First set_index
to timestamp
, then groupby
over a day period 1D
Similarly, for minutes
data.set_index('timestamp').groupby(pd.TimeGrouper('60s')).count()
and hours resolution at
data.set_index('timestamp').groupby(pd.TimeGrouper('1H')).count()
Upvotes: 1