Keep duplicate entries where I use group_by() from dplyr

Question

library(dplyr) ##activates the data.table library

mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
                                                  "14/11/2016",
                                                  "14/11/2016",
                                                  "02/01/2017",
                                                  "02/01/2017",
                                                  "15/11/2017",
                                                  "15/11/2017",
                                                  "16/11/2017",
                                                  "16/11/2017"),
                                        week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
                                        satisfactionLevel = c("Very dissatisfied",
                                                              "Very satisfied",
                                                              "Satisfied",
                                                              "Dissatisfied",
                                                              "Very dissatisfied",
                                                              "Very satisfied",
                                                              "Very dissatisfied",
                                                              "Very Satisfied",
                                                              "Very satisfied"),
                                        weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))

When I call the following function pivotTable <- mydataWithWeeksAndWeights %>% group_by(week, weight) %>% count(satisfactionLevel) it counts the satisfactionLevel for all week 46 entries. The problem is that the 46th week for the first three rows refers to 2016 with the remaining referring to 2017. I want to keep these duplicate entries.

Arthur Spoon · Accepted Answer

I can't be sure that my code does what you want as you don't give an expected output, but I think what you need to do is add a year column and add it to the group_by so that you differentiate between week 46 of 2016 and week 46 of 2017.

Edit: in case you need to automatically define the year from the end-date that you have, I'm adding in the bit in @docendodiscimus's comment:

library(dplyr)

mydataWithWeeksAndWeights <- data_frame(ended = c("14/11/2016",
                                                  "14/11/2016",
                                                  "14/11/2016",
                                                  "02/01/2017",
                                                  "02/01/2017",
                                                  "15/11/2017",
                                                  "15/11/2017",
                                                  "16/11/2017",
                                                  "16/11/2017"),
                                        week = c(46, 46, 46, 1, 1, 46, 46, 46, 46),
                                        satisfactionLevel = c("Very dissatisfied",
                                                              "Very satisfied",
                                                              "Satisfied",
                                                              "Dissatisfied",
                                                              "Very dissatisfied",
                                                              "Very satisfied",
                                                              "Very dissatisfied",
                                                              "Very Satisfied",
                                                              "Very satisfied"),
                                        weight = c(0, 1, 0.75, 0.25, 0, 1, 0, 1, 1))

mydataWithWeeksAndWeights$year <- format(as.Date(mydataWithWeeksAndWeights$ended,
                                                 "%d/%m/%Y"), "%Y")

pivotTable <- mydataWithWeeksAndWeights %>%
  group_by(week, year, weight) %>%
  count(satisfactionLevel)

Keep duplicate entries where I use group_by() from dplyr

Answers (2)

Related Questions