Aggregating timestamps spanning multiple days in Pandas dataframes

Question

I am new to pandas and I am trying to plot the number of events in function of time, at different time resolutions.

My data file looks like this:

223789 213163 1341100972
223789 213163 1341100972
376989 50329 1341101181
26375 168366 1341101183
376989 13813 1341101192
...

The third column is the timestamp. I want to read the file and plot the number of rows per timestamp. This is what I do:

data = read_table(file_name, sep=' ', header=None, names=['u1','u2','timestamp'], dtype={'timestamp': np.int}, parse_dates=[2], date_parser=datetime.datetime.fromtimestamp)
data.groupby('timestamp').size().plot()

This works if I am fine with a resolution of seconds, but I don't understand what is the best way to aggregate the data to obtain a resolution of minutes or hours. In fact if I do:

data.groupby(data['timestamp'].map(lambda t: t.hour)).size().plot()

the problem is that all the rows referring to the same hour in different days are aggregated, whereas I would like to keep the time ordering.

I haven't found a solution browsing related posts and Stack Overflow questions. Could anyone help, please?

Thanks!

Zero · Accepted Answer

Using TimeGrouper method, you could do this

data.set_index('timestamp').groupby(pd.TimeGrouper('1D')).count()

First set_index to timestamp, then groupby over a day period 1D

Similarly, for minutes

data.set_index('timestamp').groupby(pd.TimeGrouper('60s')).count()

and hours resolution at

data.set_index('timestamp').groupby(pd.TimeGrouper('1H')).count()

Aggregating timestamps spanning multiple days in Pandas dataframes

Answers (1)

Related Questions