spore234
spore234

Reputation: 3650

dplyr summarize date by weekdays

I have multiple observations from different persons on different dates, e.g.

df <- data.frame(id= c(rep(1,5), rep(2,8), rep(3,7)),
                 dates = seq.Date(as.Date("2015-01-01"), by="month", length=20))

Here we have 3 people (id), with different amount of observations each.

I now want to count the mondays, tuesdays etc for each person.

This should be done using dplyr and summarize because my real data set has many more columns which I summarize with different statistics.

It should be some something like this:

summa <- df %>% group_by(id) %>%
           summarize(mondays = #numberof mondays,
                     tuesdays = #number of tuesdays,
                       .........)

How can this be achieved?

Upvotes: 0

Views: 2399

Answers (3)

leerssej
leerssej

Reputation: 14988

Base Date functions:

summa <- df %>% group_by(id) %>%
    summarise(monday = sum(weekdays(dates) == "Monday"),
              tuesday = sum(weekdays(dates) == "Tuesday"))

Upvotes: 1

talat
talat

Reputation: 70336

I would do the following:

summa <- count(df, id, day = weekdays(dates))

# or:
#    summa <- df %>% 
#      mutate(day = weekdays(dates)) %>% 
#      count(id, day)

head(summa)
#Source: local data frame [6 x 3]
#Groups: id [2]
#
#     id        day     n
#  (dbl)      (chr) (int)
#1     1 Donnerstag     1
#2     1    Freitag     1
#3     1   Mittwoch     1
#4     1    Sonntag     2
#5     2   Dienstag     2
#6     2 Donnerstag     1

But you can also reshape to wide format:

library(tidyr)
spread(summa, day, n, fill=0)
#Source: local data frame [3 x 8]
#Groups: id [3]
#
#     id Dienstag Donnerstag Freitag Mittwoch Montag Samstag Sonntag
#  (dbl)    (dbl)      (dbl)   (dbl)    (dbl)  (dbl)   (dbl)   (dbl)
#1     1        0          1       1        1      0       0       2
#2     2        2          1       1        1      1       1       1
#3     3        1          0       2        1      2       0       1

My results are in German, but yours would be in your own language of course. The column names are German weekdays.


If you want to use summarize explicitly you can achieve the same as above using:

summa <- df %>% 
  group_by(id, day = weekdays(dates)) %>% 
  summarize(n = n())  # or do something with summarise_each() for many columns

Upvotes: 4

Matthieu P.
Matthieu P.

Reputation: 131

You could use the lubridate package:

library(lubridate)

summa <- df %>% group_by(id) %>%
    summarize(mondays = sum(wday(dates) == 2),
    ....

Upvotes: 3

Related Questions