Marcel Dumont
Marcel Dumont

Reputation: 1207

Combining Minute and Hour factors into minutes within a day with R

I am using library(rga) to retrieve from google analytics API the pageviews for each minute of a given day, this by using dimensions "ga:date,ga:hour,ga:minute"

the problem is that the returned data frame is returning the hours and minutes as ordered factors

'data.frame':   1440 obs. of  4 variables:
$ date           : Date, format: "2014-03-31" "2014-03-31" "2014-03-31" "2014-03-31" ...
$ hour           : Ord.factor w/ 24 levels "0"<"1"<"2"<"3"<..: 1 1 1 1 1 1 1 1 1 1 ...
$ minute         : Ord.factor w/ 60 levels "0"<"1"<"2"<"3"<..: 1 2 3 4 5 6 7 8 9 10 ...
$ pageviews      : num  212 177 219 217 182 190 179 217 206 183 ...

What i am looking for is a ordered factor of minutes within a day. ie 1:1440

Upvotes: 0

Views: 222

Answers (1)

Spacedman
Spacedman

Reputation: 94202

If you know your data is a complete ordered set of all 1440 minutes, then just do:

d$minfactor = factor(1:1440, ordered=TRUE)

Otherwise:

d$Fmin = factor(60*(as.numeric(d$hour)-1) + as.numeric(d$minute), ordered=TRUE)

You should probably use numbers instead of factors - the ordering in an ordered factor is the ordering of the levels, so you can do things like this:

> z = factor(5:1, ordered=TRUE, levels=5:1)
> z[1] < z[2]
[1] TRUE
> z[1:2]
[1] 5 4
Levels: 5 < 4 < 3 < 2 < 1

which looks like 5 is less than 4.

Its a fairly strong rule that if your factor levels are best kept as numbers, then they should be numbers. If they are categories, like Male and Female, the best levels are "M" and "F", not 0 and 1. If the levels are ordered but not numeric, use an ordered factor, such as "Small", "Medium", "Large" (where there's no numeric definition of S, M, L).

Upvotes: 2

Related Questions