iod
iod

Reputation: 7592

Transferring value from another column in ifelse within dplyr::mutate

I have dataframe dd (dput at bottom of question):

# A tibble: 6 x 2
# Groups:   Date [5]
  Date     keeper
  <chr>    <lgl> 
1 1/1/2018 TRUE  
2 2/1/2018 TRUE  
3 3/1/2018 FALSE 
4 4/1/2018 FALSE 
5 3/1/2018 TRUE  
6 5/1/2018 TRUE 

Note it is already grouped by Date. I'm trying to create another column that will turn "keeper" to TRUE if there's only one row in the group, and otherwise keep the value of keeper. That seemed fairly simple, but when I tried this, I got the following result:

dd %>% mutate(moose=ifelse(n()==1,TRUE,keeper))
# A tibble: 6 x 3
# Groups:   Date [5]
  Date     keeper moose
  <chr>    <lgl>  <lgl>
1 1/1/2018 TRUE   TRUE 
2 2/1/2018 TRUE   TRUE 
3 3/1/2018 FALSE  FALSE
4 4/1/2018 FALSE  TRUE 
5 3/1/2018 TRUE   FALSE
6 5/1/2018 TRUE   TRUE 

Note that rows 3 and 5 have the same Date, so they should have just retained what's in keeper for the new column -- but they both got turned to FALSE. What am I missing?

Expected output:

  Date     keeper moose
  <chr>    <lgl>  <lgl>
1 1/1/2018 TRUE   TRUE 
2 2/1/2018 TRUE   TRUE 
3 3/1/2018 FALSE  FALSE
4 4/1/2018 FALSE  TRUE 
5 3/1/2018 TRUE   TRUE
6 5/1/2018 TRUE   TRUE 

(note row 5)

Here's the dput for the dataframe:

dd<-structure(list(Date = c("1/1/2018", "2/1/2018", "3/1/2018", "4/1/2018", 
"3/1/2018", "5/1/2018"), keeper = c(TRUE, TRUE, FALSE, FALSE, 
TRUE, TRUE)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), vars = "Date", drop = TRUE, indices = list(
    0L, 1L, c(2L, 4L), 3L, 5L), group_sizes = c(1L, 1L, 2L, 1L, 
1L), biggest_group_size = 2L, labels = structure(list(Date = c("1/1/2018", 
"2/1/2018", "3/1/2018", "4/1/2018", "5/1/2018")), class = "data.frame", row.names = c(NA, 
-5L), vars = "Date", drop = TRUE, indices = list(0L, 1L, 2L, 
    4L, 3L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
    Date = c("1/1/2018", "2/1/2018", "3/1/2018", "3/1/2018", 
    "4/1/2018", "5/1/2018"), keeper = c(TRUE, TRUE, FALSE, TRUE, 
    FALSE, TRUE)), class = "data.frame", row.names = c(NA, -6L
), vars = c("Date", "keeper"), drop = TRUE, .Names = c("Date", 
"keeper")), .Names = "Date"), .Names = c("Date", "keeper"))

ADDENDUM:

As I continue to play with this dataframe, I discovered that if I first create a column n using add_count, and refer to that column in my ifelse instead of to n(), I get the result I'm looking for. What's causing this? Why isn't n() giving me the same result?

Upvotes: 1

Views: 1340

Answers (1)

akrun
akrun

Reputation: 887048

There is a recycling effect. For ifelse, we need the arguments to have the same length. The length of n() is 1. The second argument TRUE length is 1. So, there is a mismatch in length with the third argument 'keeper' which is of length n(). This creates the imbalance in recycling. The OP mentioned in the addendum that if a column is created, then the issue is not there. Reason is that once the column is created, the length of 'n' column is not 1, it is the n().

dd %>% 
   mutate(moose = ifelse(rep(n(), n()) == 1, TRUE, keeper))
# A tibble: 6 x 3
# Groups:   Date [5]
#  Date     keeper moose
#  <chr>    <lgl>  <lgl>
#1 1/1/2018 TRUE   TRUE 
#2 2/1/2018 TRUE   TRUE 
#3 3/1/2018 FALSE  FALSE
#4 4/1/2018 FALSE  TRUE 
#5 3/1/2018 TRUE   TRUE 
#6 5/1/2018 TRUE   TRUE 

Also, as the length of n() is 1, we can use if/else

dd %>% 
    mutate(moose = if(n()==1) TRUE else keeper)
# A tibble: 6 x 3
# Groups:   Date [5]
#  Date     keeper moose
#  <chr>    <lgl>  <lgl>
#1 1/1/2018 TRUE   TRUE 
#2 2/1/2018 TRUE   TRUE 
#3 3/1/2018 FALSE  FALSE
#4 4/1/2018 FALSE  TRUE 
#5 3/1/2018 TRUE   TRUE 
#6 5/1/2018 TRUE   TRUE 

Upvotes: 2

Related Questions