Reputation: 13015
There is a data table shown as follows:
timestamp id
785340 2016-04-01 00:01:19.000 4624
785341 2016-04-01 00:01:19.000 4624
785342 2016-04-01 00:02:20.000 4624
794012 2016-04-01 00:02:21.000 4624
18866 2016-04-01 00:02:28.142 4769
794198 2016-04-01 00:03:31.000 4624
18906 2016-04-01 00:03:40.130 4648
18907 2016-04-01 00:04:40.155 4648
18908 2016-04-01 00:05:40.157 4648
797062 2016-04-01 00:05:48.000 4624
The table has two columns: timestamp and id. The timestamp column is generated using df.table$timestamp<-as.POSIXct(df.table$timestamp)
The first column just represents the row number in the original table since I re-ordered the table based on timestamp.
Right now, I want to count the number of ids for a given minute, for instance the 1-th minute has 2 ids; the 2-th minute has 3 ids, etc. In other words, I want to extract the minute information from timestamp. The goal is to construct a time series, with one minute as an interval, the number of ids located within that 1 minute interval is the value. Are there any efficient way to do that? Thanks.
Upvotes: 0
Views: 61
Reputation: 886928
We can do this without any packages
aggregate(cbind(count=id)~ts_min, data =
transform(df.table, ts_min=format(timestamp, "%M")), FUN = length)
# ts_min count
#1 01 2
#2 02 3
#3 03 2
#4 04 1
#5 05 2
Upvotes: 1
Reputation: 5532
You could use lubridate
and dplyr
.
df.table %>%
group_by(ts_min = minute(timestamp)) %>%
summarise(count = n())
## Source: local data frame [5 x 2]
##
## ts_min count
## (int) (int)
## 1 1 2
## 2 2 3
## 3 3 2
## 4 4 1
## 5 5 2
Upvotes: 0