akhmed
akhmed

Reputation: 3635

Error when combining group_by, mutate and ifelse. Is it a bug? incompatible types, expecting a logical vector**

I am having strange issues with dplyr and combination of group_by, mutate and ifelse. Consider the following data.frame

df1 <- data.frame(
  crawl.id = c(1, 1, 2, 1, 1, 1),
  group.id = factor(c("1", "2", "2", "3", "3", "3")),
  hits.diff = c(NA, NA, 0, NA, NA, NA)
)
df1
#>   crawl.id group.id hits.diff
#> 1        1        1        NA
#> 2        1        2        NA
#> 3        2        2         0
#> 4        1        3        NA
#> 5        1        3        NA
#> 6        1        3        NA

When I use it the following code

library(dplyr)
df1 %>%
  group_by(group.id) %>% 
  mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )

For some reason I get

Error: incompatible types, expecting a logical vector**

However, removing either group_by() or ifelse everything works as expected:

df1 %>%
  mutate( hits.consumed = ifelse(hits.diff<=0,-hits.diff,0) )

crawl.id group.id hits.diff hits.consumed
1        1        1        NA            NA
2        1        2        NA            NA
3        2        2         0             0
4        1        3        NA            NA
5        1        3        NA            NA
6        1        3        NA            NA

df1 %>%
  group_by( group.id ) %>%
  mutate( hits.consumed = -hits.diff )

  crawl.id group.id hits.diff hits.consumed
1        1        1        NA            NA
2        1        2        NA            NA
3        2        2         0             0
4        1        3        NA            NA
5        1        3        NA            NA
6        1        3        NA            NA

Is it a bug or a feature? Can anyone replicate this? What's so special about that specific combination of group_by, mutate and ifelse that makes it fail?

My own research led me here: https://github.com/hadley/dplyr/issues/464 which suggests that it should be fixed by now.

Upvotes: 21

Views: 4800

Answers (1)

thelatemail
thelatemail

Reputation: 93813

Wrap it all in as.numeric to force the output format so the NAs, which are logical by default, don't override the class of the output variable:

df1 %>%
  group_by(group.id) %>% 
  mutate( hits.consumed = as.numeric(ifelse(hits.diff<=0,-hits.diff,0)) )

#  crawl.id group.id hits.diff hits.consumed
#1        1        1        NA            NA
#2        1        2        NA            NA
#3        2        2         0             0
#4        1        3        NA            NA
#5        1        3        NA            NA
#6        1        3        NA            NA

Pretty sure this is the same issue as here: Custom sum function in dplyr returns inconsistent results , as this result suggests:

out <- df1[1:2,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "logical"
out <- df1[1:3,] %>%  mutate( hits.consumed = ifelse(hits.diff <= 0, -hits.diff, 0))
class(out$hits.consumed)
#[1] "numeric"

Upvotes: 33

Related Questions