giac
giac

Reputation: 4309

dplyr group_by_ lazy .drop = F

I am trying to incorporate the drop = F into the following dplyr function

dspreadN = function(data, ...) {
  data %>% group_by_(.dots = lazyeval::lazy_dots(...), .drop = F) %>%
    summarise(n = n()*100) %>% spread(value, n, fill = 0)
}

Basically, the function transform this

   id x
1   1 A
2   1 A
3   1 A
4   1 A
5   2 A
6   2 A
7   2 B
8   2 B
9   3 A
10  3 A
11  3 B
12  3 A

into that

    id drop      A     B
  <dbl> <lgl> <dbl> <dbl>
1     1 FALSE   400     0
2     2 FALSE   200   200
3     3 FALSE   300   100

I use the function in this way dff %>% dspreadN(id, value = x)

(my real example is much more complicated that why I need the dplyr function).

What I would like is to keep all the levels of the x variable, here the C is missing.

    id      A     B    C
  <dbl>  <dbl> <dbl> <dbl>
1     1   400     0   0
2     2   200   200   0
3     3   300   100   0

Why is the drop = F not working?

library(tidyverse)

# data
dff = data.frame(id = c(1,1,1,1, 2,2,2,2, 3,3,3,3, 4,4,4,4), 
                 x = c('A','A','A','A', 'A','A','B','B', 'A','A','B','A', 'C', 'C', 'C', 'C'))

# remove the case to keep the C level 
dff = dff[dff$id != 4, ]

Upvotes: 0

Views: 246

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389335

  • You can use .drop = FALSE argument in count instead of group_by.
  • group_by + summarise with n() is equal to count.
  • spread has been deprecated in favour of pivot_wider.

Thanks to @Edo for useful tips in improving the post

library(dplyr)
library(tidyr)

dspreadN = function(data, ...) {
  data %>%
    count(id, x, .drop = FALSE, wt = n() * 100) %>%
    pivot_wider(names_from = x, values_from = n, values_fill = 0)
}

dspreadN(dff, id, x)

#     id     A     B     C
#  <dbl> <dbl> <dbl> <dbl>
#1     1   400     0     0
#2     2   200   200     0
#3     3   300   100     0

Upvotes: 2

Related Questions