Reputation: 7592
I have dataframe dd
(dput at bottom of question):
# A tibble: 6 x 2
# Groups: Date [5]
Date keeper
<chr> <lgl>
1 1/1/2018 TRUE
2 2/1/2018 TRUE
3 3/1/2018 FALSE
4 4/1/2018 FALSE
5 3/1/2018 TRUE
6 5/1/2018 TRUE
Note it is already grouped by Date. I'm trying to create another column that will turn "keeper" to TRUE if there's only one row in the group, and otherwise keep the value of keeper. That seemed fairly simple, but when I tried this, I got the following result:
dd %>% mutate(moose=ifelse(n()==1,TRUE,keeper))
# A tibble: 6 x 3
# Groups: Date [5]
Date keeper moose
<chr> <lgl> <lgl>
1 1/1/2018 TRUE TRUE
2 2/1/2018 TRUE TRUE
3 3/1/2018 FALSE FALSE
4 4/1/2018 FALSE TRUE
5 3/1/2018 TRUE FALSE
6 5/1/2018 TRUE TRUE
Note that rows 3 and 5 have the same Date, so they should have just retained what's in keeper for the new column -- but they both got turned to FALSE. What am I missing?
Expected output:
Date keeper moose
<chr> <lgl> <lgl>
1 1/1/2018 TRUE TRUE
2 2/1/2018 TRUE TRUE
3 3/1/2018 FALSE FALSE
4 4/1/2018 FALSE TRUE
5 3/1/2018 TRUE TRUE
6 5/1/2018 TRUE TRUE
(note row 5)
Here's the dput for the dataframe:
dd<-structure(list(Date = c("1/1/2018", "2/1/2018", "3/1/2018", "4/1/2018",
"3/1/2018", "5/1/2018"), keeper = c(TRUE, TRUE, FALSE, FALSE,
TRUE, TRUE)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), vars = "Date", drop = TRUE, indices = list(
0L, 1L, c(2L, 4L), 3L, 5L), group_sizes = c(1L, 1L, 2L, 1L,
1L), biggest_group_size = 2L, labels = structure(list(Date = c("1/1/2018",
"2/1/2018", "3/1/2018", "4/1/2018", "5/1/2018")), class = "data.frame", row.names = c(NA,
-5L), vars = "Date", drop = TRUE, indices = list(0L, 1L, 2L,
4L, 3L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
Date = c("1/1/2018", "2/1/2018", "3/1/2018", "3/1/2018",
"4/1/2018", "5/1/2018"), keeper = c(TRUE, TRUE, FALSE, TRUE,
FALSE, TRUE)), class = "data.frame", row.names = c(NA, -6L
), vars = c("Date", "keeper"), drop = TRUE, .Names = c("Date",
"keeper")), .Names = "Date"), .Names = c("Date", "keeper"))
ADDENDUM:
As I continue to play with this dataframe, I discovered that if I first create a column n
using add_count
, and refer to that column in my ifelse
instead of to n()
, I get the result I'm looking for. What's causing this? Why isn't n()
giving me the same result?
Upvotes: 1
Views: 1340
Reputation: 887048
There is a recycling effect. For ifelse
, we need the arguments to have the same length. The length
of n()
is 1. The second argument TRUE
length is 1. So, there is a mismatch in length
with the third argument 'keeper' which is of length
n()
. This creates the imbalance in recycling. The OP mentioned in the addendum that if a column is created, then the issue is not there. Reason is that once the column is created, the length
of 'n' column is not 1, it is the n()
.
dd %>%
mutate(moose = ifelse(rep(n(), n()) == 1, TRUE, keeper))
# A tibble: 6 x 3
# Groups: Date [5]
# Date keeper moose
# <chr> <lgl> <lgl>
#1 1/1/2018 TRUE TRUE
#2 2/1/2018 TRUE TRUE
#3 3/1/2018 FALSE FALSE
#4 4/1/2018 FALSE TRUE
#5 3/1/2018 TRUE TRUE
#6 5/1/2018 TRUE TRUE
Also, as the length
of n()
is 1, we can use if/else
dd %>%
mutate(moose = if(n()==1) TRUE else keeper)
# A tibble: 6 x 3
# Groups: Date [5]
# Date keeper moose
# <chr> <lgl> <lgl>
#1 1/1/2018 TRUE TRUE
#2 2/1/2018 TRUE TRUE
#3 3/1/2018 FALSE FALSE
#4 4/1/2018 FALSE TRUE
#5 3/1/2018 TRUE TRUE
#6 5/1/2018 TRUE TRUE
Upvotes: 2