Reputation: 29
After grouping my data, I would like to impute values of a column in both directions based on one or more existing values. I have the following data:
id inw test found
001 0 1 <NA>
001 1 2 <NA>
001 1 3 No
001 1 4 No
002 1 1 No
002 0 2 <NA>
002 1 3 Yes
002 1 4 No
003 1 1 Yes
003 1 2 Yes
003 1 3 No
003 0 4 <NA>
The data is grouped by id and the found column is in question here. Essentially, if a value of 'No' appears within a group, all previous values should be changed to 'No'. If 'Yes' appears, all subsequent values should change to 'Yes'.
Therefore, the final result should look like this:
id inw test found
001 0 1 No
001 1 2 No
001 1 3 No
001 1 4 No
002 1 1 No
002 0 2 <NA>
002 1 3 Yes
002 1 4 Yes
003 1 1 Yes
003 1 2 Yes
003 1 3 Yes
003 0 4 Yes
Critically, NA's can remain if there is not a 'No' value anywhere after it it, or no 'Yes' values anywhere before it within the group (highlighted above in group id == 002).
Any help would be greatly appreciated!
Thanks, Dylan
Upvotes: 0
Views: 126
Reputation: 26218
My proposed solution without the need of creating any user defined function
library(tidyverse)
df %>% group_by(id) %>%
mutate(dummy_id = row_number(), #Not sure that there are gaps between test column, thus created one
coln = ifelse(is.na(first(which(found == 'No'))), 0, first(which(found == 'No'))),
coly = ifelse(is.na(first(which(found == 'Yes'))), n()+1, first(which(found == 'Yes'))),
found2 = ifelse(dummy_id >= coly, 'Yes',
ifelse(dummy_id <= coln, 'No', found))) %>%
select(-dummy_id, -coln, -coly) %>%
ungroup()
# A tibble: 12 x 5
id inw test found found2
<int> <int> <int> <chr> <chr>
1 1 0 1 NA No
2 1 1 2 NA No
3 1 1 3 No No
4 1 1 4 No No
5 2 1 1 No No
6 2 0 2 NA NA
7 2 1 3 Yes Yes
8 2 1 4 No Yes
9 3 1 1 Yes Yes
10 3 1 2 Yes Yes
11 3 1 3 No Yes
12 3 0 4 NA Yes
Upvotes: 0
Reputation: 8844
Something like this?
library(dplyr)
fill_na <- function(x) {
if (length(first_yes <- head(which(x == "Yes"), 1L)) > 0L)
x[seq.int(first_yes, length(x), 1L)] <- "Yes"
if (length(last_no <- tail(which(x == "No"), 1L)) > 0L)
x[seq.int(1L, last_no, 1L)] <- "No"
x
}
df %>% group_by(id) %>% mutate(found = fill_na(found))
Output
# A tibble: 12 x 4
# Groups: id [3]
id inw test found
<chr> <dbl> <dbl> <chr>
1 001 0 1 No
2 001 1 2 No
3 001 1 3 No
4 001 1 4 No
5 002 1 1 No
6 002 0 2 NA
7 002 1 3 Yes
8 002 1 4 Yes
9 003 1 1 Yes
10 003 1 2 Yes
11 003 1 3 Yes
12 003 0 4 Yes
Upvotes: 1