crackpotHouseplant
crackpotHouseplant

Reputation: 401

Separating timestamped data into bins

I have a text file containing timestamps with associated button presses. I loaded it into R using R studio. The button presses are formatted as strings.

52 right 08:16:23

53     a 08:16:23

54    up 08:16:24

55     a 08:16:24

56     b 08:16:24

57     a 08:16:24

58     a 08:16:24

59 right 08:16:24

60     a 08:16:24

The timestamps have been converted into POSIXct timestamps, but came in separate date and time fields in my text file.

I want to break the data into equally spaced bins based on time and count the frequency of each button within these.

There are a handful of buttons and there are a lot of different nonunique timestamps.

Ideally I'd like as small as minute intervals and a solution that allows me to change the granularity would be great.

Upvotes: 0

Views: 169

Answers (2)

IRTFM
IRTFM

Reputation: 263362

Let's assume you have a data.frame named "dat" and that the time value is in a column named "V3", as it is in the one I created form you text. Then using seq.POSIXct with an interval of a minute only creates a single point and cut cannot handle that so I started adding different values. In the process I discovered that my initial attempt with seq.POSIXct returned NA for the upper values because the sequence ended if the seconds were higher in the max time than the min time so I added 60 seconds to the max. as the interval for this demonstration. You should be able to generalize the code in the obvious locations.

# Initial failed attempt with your data
> grp <- cut(dat$time, breaks=seq(min(dat$time), max(dat$time), by="1 min"), include.lowest=TRUE) 
Error in cut.default(unclass(x), unclass(breaks), labels = labels, right = right,  : 
  'breaks' are not unique

 # Better data, more challenging, allows better testing

dat$grp <- cut(dat$time, breaks=seq(min(dat$time), 
                                      max(dat$time)+60, by="1 min"), 
                           include.lowest=TRUE,right=TRUE)

> dat
  V1    V2       V3                time                 grp
1 52 right 08:16:23 2016-04-17 08:16:23 2016-04-17 08:15:24
2 53     a 08:16:23 2016-04-17 08:16:23 2016-04-17 08:15:24
3 54    up 08:17:59 2016-04-17 08:17:59 2016-04-17 08:17:24
4 55     a 08:18:45 2016-04-17 08:18:45 2016-04-17 08:18:24
5 56     b 08:20:53 2016-04-17 08:20:53 2016-04-17 08:20:24
6 57     a 08:20:01 2016-04-17 08:20:01 2016-04-17 08:19:24
7 58     a  08:17:5 2016-04-17 08:17:05 2016-04-17 08:16:24
8 59 right 08:18:24 2016-04-17 08:18:24 2016-04-17 08:17:24
9 60     a 08:14:24 2016-04-17 08:14:24 2016-04-17 08:14:24

You can get the counts by group with table:

> table(dat$grp)

2016-04-17 08:14:24 2016-04-17 08:15:24 2016-04-17 08:16:24 2016-04-17 08:17:24 
                  1                   2                   1                   2 
2016-04-17 08:18:24 2016-04-17 08:19:24 2016-04-17 08:20:24 
                  1                   1                   1 

See ?table for additional options about handling missing values.

Upvotes: 1

Alexander
Alexander

Reputation: 948

These functions may be of interest to you:

The answer depends if the time is recognized by R. If not, you can use

chron( ... ) 

on your time variable. Please see: http://www.stat.berkeley.edu/~s133/dates.html

c <- cut(time_variable, number_of_bins)

This should get the max and min of the time variable, divide the range by the number of bins, then assign each of the times to the appropriate bin

table(c)

This will return the frequency in each bin

Upvotes: 1

Related Questions