Ingvar Petrov
Ingvar Petrov

Reputation: 23

Filtering dataframe by day of the week

I have a dataframe with daily statistic for a web-site

> head(df,7)
        date users sessions goalCompletionsAll       dow        gos        gou
1 2014-08-01  3514     5239                 90    Friday 0.01717885 0.02561184
2 2014-08-02  3382     4874                 99  Saturday 0.02031186 0.02927262
3 2014-08-03  3981     5499                 81    Sunday 0.01472995 0.02034665
4 2014-08-04  4493     6434                 99    Monday 0.01538701 0.02203428
5 2014-08-05  4344     6505                111   Tuesday 0.01706380 0.02555249
6 2014-08-06  4091     6117                115 Wednesday 0.01880007 0.02811049
7 2014-08-07  3617     5519                 90  Thursday 0.01630730 0.02488250

I needed to find daily averages by the day of the week. This is my attempt to do that:

> daysOfWeek
[1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday"  "Sunday"
dailyAverages <- sapply(daysOfWeek, function (x) {
  qq <- filter(df, dow==x)
  convRate <- qq$goalCompletionsAll/qq$users
  run <- data.frame(mean(convRate),sd(convRate), 
  max(convRate), min(convRate), median(convRate))
  names(run) <- c("Mean", "SD", "Max", "Min", "Median")
  run
})

> dailyAverages
       Monday      Tuesday     Wednesday   Thursday    Friday     Saturday   
Mean   0.02496614  0.0262649   0.02576256  0.02602963  0.026684   0.02440045 
SD     0.003603139 0.004615455 0.003891674 0.004525479 0.00445875 0.004779429
Max    0.03266055  0.03274712  0.03141136  0.03543914  0.03673769 0.033213   
Min    0.01853659  0.01748487  0.01904376  0.02026432  0.01734417 0.01593625 
Median 0.02488883  0.02651838  0.02629004  0.02543797  0.02599134 0.02502503 
       Sunday     
Mean   0.02426048 
SD     0.004086276
Max    0.03112314 
Min    0.01581155 
Median 0.02456262 

This result is almost what I wanted, but it needs to be transposed:

> dx <- t(dailyAverages)
> dx
          Mean       SD          Max        Min        Median    
Monday    0.02496614 0.003603139 0.03266055 0.01853659 0.02488883
Tuesday   0.0262649  0.004615455 0.03274712 0.01748487 0.02651838
Wednesday 0.02576256 0.003891674 0.03141136 0.01904376 0.02629004
Thursday  0.02602963 0.004525479 0.03543914 0.02026432 0.02543797
Friday    0.026684   0.00445875  0.03673769 0.01734417 0.02599134
Saturday  0.02440045 0.004779429 0.033213   0.01593625 0.02502503
Sunday    0.02426048 0.004086276 0.03112314 0.01581155 0.02456262

I wonder, if there a more efficient, non-ugly way to do the same thing?

Upvotes: 1

Views: 452

Answers (1)

akrun
akrun

Reputation: 887971

You could try dplyr. The chain/pipe operator (%>%) connects the "lhs" and "rhs" together. The variable "dow" is used as the grouping variable (group_by(..), compute the "convRate" using transmute which will drop the existing variables, get the mean, sd, etc. of "convRate" using summarise_each. The advantage of summarise_each is that it can be used for multiple columns at once.

library(dplyr)
df$dow <- substr(df$dow, 1,3)
res <- df %>%
          group_by(dow) %>% 
          transmute(convRate=goalCompletionsAll/users) %>% 
          summarise_each(funs(mean, sd, max, min, median), convRate)
indx <- match(c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'), res$dow)
res1 <- res[indx,]

Upvotes: 4

Related Questions