Reputation: 5068
I want to separate by data into quantiles, same as in this great question
The problem is that I want to do it every day in a time series:
set.seed(123)
temp.all <- data.frame(date = c(rep(Sys.Date() - 1, 12), rep(Sys.Date(), 12)),
name=letters[c(1:12, 1:12)], value=rnorm(24))
At the moment, I'm solving with a for-loop:
library(dplyr)
for (d in unique(temp.all$date)) {
temp = filter(temp.all, date == d)
temp$quartile <- with(temp, factor(
findInterval( val, c(-Inf,
quantile(val, probs=c(0.25, .5, .75)), Inf) , na.rm=TRUE),
labels=c("Q1","Q2","Q3","Q4")
))
# ...and doing rbind on 'temp' to reconstruct temp.all with quartiles
}
Any ideas on how to avoid the dreaded for-loop? Is there maybe a group_by
way of doing this?
Upvotes: 1
Views: 2718
Reputation: 60070
Using group_by
you can just do:
library(lubridate)
temp.all = temp.all %>%
# lubridate::date(date) might be necessary if you have datetimes
group_by(date) %>%
mutate(quartile = cut(value, breaks = 4, labels = paste0("Q", 1:4)))
dplyr
also has a function ntile
which should behave similarly to cut
and should give the same results.
Upvotes: 4
Reputation: 42544
For the sake of completeness (and to correct a wrong answer), here is also a data.table
solution:
library(data.table)
setDT(temp.all)[, quartile := cut(value, breaks = 4L, labels = paste0("Q", 1:4)), by = date]
temp.all
date name value quartile 1: 2017-08-28 a -0.56047565 Q1 2: 2017-08-28 b -0.23017749 Q2 3: 2017-08-28 c 1.55870831 Q4 4: 2017-08-28 d 0.07050839 Q2 5: 2017-08-28 e 0.12928774 Q2 6: 2017-08-28 f 1.71506499 Q4 ... 18: 2017-08-29 f -1.96661716 Q1 19: 2017-08-29 g 0.70135590 Q3 20: 2017-08-29 h -0.47279141 Q2 21: 2017-08-29 i -1.06782371 Q1 22: 2017-08-29 j -0.21797491 Q2 23: 2017-08-29 k -1.02600445 Q2 24: 2017-08-29 l -0.72889123 Q2 date name value quartile
Note that the results are grouped by date
as requested by the OP and not by day(date)
which would aggregate the day of month, e.g., January 1st, February 1st, March 1st, etc.
Further note that only one additional column quartile
is being added to temp.all
in place, i.e., without copying the whole data set in order to save memory and time (which might become relevant when dealing with large data sets).
Upvotes: 1