Alexander
Alexander

Reputation: 4635

ifelse strange behaviour in grouped data

I have come upon an issue with the ifelse function not properly working in my data frame. I want to add a new column based on a conditional in the grouped data, but it seems that only the first element is being passed into the new column.

df <- data.frame(ID = c(1, 1, 1 ,2, 2, 5), A = c("foo", "bar", "bar", "foo", "foo", "bar"), B = c(seq(1:6)))

ID   A B
1  1 foo 1
2  1 bar 2
3  1 bar 3
4  2 foo 4
5  2 foo 5
6  5 bar 6



df%>%
  group_by(ID) %>%
  mutate(C = ifelse(length(which(A == 'bar')) >= 2, B, NA))


# A tibble: 6 x 4
# Groups:   ID [3]
     ID      A     B     C
  <dbl> <fctr> <int> <int>
1     1    foo     1     1
2     1    bar     2     1
3     1    bar     3     1
4     2    foo     4    NA
5     2    foo     5    NA
6     5    bar     6    NA

I also tried do like in tidyverse/dplyr/issues/489

but it produces the same result.

What is the MATRIX;)

expected output

# A tibble: 6 x 4
# Groups:   ID [3]
     ID      A     B     C
  <dbl> <fctr> <int> <int>
1     1    foo     1     1
2     1    bar     2     2
3     1    bar     3     3
4     2    foo     4    NA
5     2    foo     5    NA
6     5    bar     6    NA

Upvotes: 2

Views: 76

Answers (1)

akrun
akrun

Reputation: 887088

Here the condition returns a logical vector of length 1 for each 'ID',

df %>% 
     group_by(ID) %>%
     summarise(ind = length(which(A=='bar'))>=2)
# A tibble: 3 x 2
#     ID   ind
#  <dbl> <lgl>
#1     1  TRUE
#2     2 FALSE
#3     5 FALSE

so it is better to use if/else. When we use ifelse, the test, yes and no should be of the the same length. As the test is returning a single element, the first element of 'B' i.e. we get the first element of 'B' populating for the entire 'ID'

df %>% 
  group_by(ID) %>%
  mutate(C = if(length(which(A=='bar'))>=2) B else NA)
# A tibble: 6 x 4
# Groups:   ID [3]
#     ID      A     B    C
#  <dbl> <fctr> <int> <int>
#1     1    foo     1     1
#2     1    bar     2     2
#3     1    bar     3     3
#4     2    foo     4    NA
#5     2    foo     5    NA
#6     5    bar     6    NA

However, if we still needs to use ifelse, then replicate

df %>%
  group_by(ID) %>%
  mutate(C=ifelse(rep(length(which(A=='bar'))>=2, n()),B,NA))
# A tibble: 6 x 4
# Groups:   ID [3]
#     ID      A     B     C
#  <dbl> <fctr> <int> <int>
#1     1    foo     1     1
#2     1    bar     2     2
#3     1    bar     3     3
#4     2    foo     4    NA
#5     2    foo     5    NA
#6     5    bar     6    NA

Upvotes: 4

Related Questions