Reputation: 23
I have a dataframe with daily statistic for a web-site
> head(df,7)
date users sessions goalCompletionsAll dow gos gou
1 2014-08-01 3514 5239 90 Friday 0.01717885 0.02561184
2 2014-08-02 3382 4874 99 Saturday 0.02031186 0.02927262
3 2014-08-03 3981 5499 81 Sunday 0.01472995 0.02034665
4 2014-08-04 4493 6434 99 Monday 0.01538701 0.02203428
5 2014-08-05 4344 6505 111 Tuesday 0.01706380 0.02555249
6 2014-08-06 4091 6117 115 Wednesday 0.01880007 0.02811049
7 2014-08-07 3617 5519 90 Thursday 0.01630730 0.02488250
I needed to find daily averages by the day of the week. This is my attempt to do that:
> daysOfWeek
[1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday" "Sunday"
dailyAverages <- sapply(daysOfWeek, function (x) {
qq <- filter(df, dow==x)
convRate <- qq$goalCompletionsAll/qq$users
run <- data.frame(mean(convRate),sd(convRate),
max(convRate), min(convRate), median(convRate))
names(run) <- c("Mean", "SD", "Max", "Min", "Median")
run
})
> dailyAverages
Monday Tuesday Wednesday Thursday Friday Saturday
Mean 0.02496614 0.0262649 0.02576256 0.02602963 0.026684 0.02440045
SD 0.003603139 0.004615455 0.003891674 0.004525479 0.00445875 0.004779429
Max 0.03266055 0.03274712 0.03141136 0.03543914 0.03673769 0.033213
Min 0.01853659 0.01748487 0.01904376 0.02026432 0.01734417 0.01593625
Median 0.02488883 0.02651838 0.02629004 0.02543797 0.02599134 0.02502503
Sunday
Mean 0.02426048
SD 0.004086276
Max 0.03112314
Min 0.01581155
Median 0.02456262
This result is almost what I wanted, but it needs to be transposed:
> dx <- t(dailyAverages)
> dx
Mean SD Max Min Median
Monday 0.02496614 0.003603139 0.03266055 0.01853659 0.02488883
Tuesday 0.0262649 0.004615455 0.03274712 0.01748487 0.02651838
Wednesday 0.02576256 0.003891674 0.03141136 0.01904376 0.02629004
Thursday 0.02602963 0.004525479 0.03543914 0.02026432 0.02543797
Friday 0.026684 0.00445875 0.03673769 0.01734417 0.02599134
Saturday 0.02440045 0.004779429 0.033213 0.01593625 0.02502503
Sunday 0.02426048 0.004086276 0.03112314 0.01581155 0.02456262
I wonder, if there a more efficient, non-ugly way to do the same thing?
Upvotes: 1
Views: 452
Reputation: 887971
You could try dplyr
. The chain/pipe operator (%>%
) connects the "lhs" and "rhs" together. The variable "dow" is used as the grouping variable (group_by(..
), compute the "convRate" using transmute
which will drop the existing variables, get the mean
, sd
, etc. of "convRate" using summarise_each
. The advantage of summarise_each
is that it can be used for multiple columns at once.
library(dplyr)
df$dow <- substr(df$dow, 1,3)
res <- df %>%
group_by(dow) %>%
transmute(convRate=goalCompletionsAll/users) %>%
summarise_each(funs(mean, sd, max, min, median), convRate)
indx <- match(c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'), res$dow)
res1 <- res[indx,]
Upvotes: 4