Reputation: 1040
I have an example dataset
newdata<-data.frame(Tow.y=c(21,"NA","NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23,"NA","NA"),Tow=c("NA","NA","NA",21,"NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23))
newdata$Tow.y<-as.numeric(as.character(newdata$Tow.y))
newdata$Tow<-as.numeric(as.character(newdata$Tow))
newdata1<-newdata %>%
mutate(Station = coalesce(Tow.y, Tow))
newdata1
The resulting code produces:
Tow.y Tow Station
1 21 NA 21
2 NA NA NA
3 NA NA NA
4 NA 21 21
5 NA NA NA
6 NA NA NA
7 22 NA 22
8 NA NA NA
9 NA 22 22
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 23 NA 23
14 NA NA NA
15 NA 23 23
I would like to fill in NAs for NAs between unique values in Station. So NAs in between the two 21 values would be 21, the NAs in between the 22s would be 22, etc. The NAs in between consecutive numbers would remain NAs.
Like this:
Tow.y Tow Station
1 21 NA 21
2 NA NA 21
3 NA NA 21
4 NA 21 21
5 NA NA NA
6 NA NA NA
7 22 NA 22
8 NA NA 22
9 NA 22 22
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 23 NA 23
14 NA NA 23
15 NA 23 23
I have tried the na.locf function in the zoo package, but that replaces all NA values.
newdata1$Station2<-na.locf(newdata1$Station,na.rm = F)
Other examples I have looked at show that you can use na.locf with a group variable, but I dont have a grouping variable that is complete for the data set. Does anyone have a method for filling in the NAs where I need them to be filled in.
Upvotes: 2
Views: 145
Reputation: 886948
Based on the example, it seems that the 'Tow' and 'Tow.y' values match in a 'start', 'end' way. In that case, we can use base R
methods.
Create a sequence index ('i1') to replicate the non-NA elements in 'Tow' (or 'Tow.y') for the 'Station' column. The 'lst' returns a list
of numeric index, which is used to assign the values to 'Station'
lst <- do.call(Map, c(f = seq, unname(lapply(newdata,
function(x) seq_along(x)[!is.na(x)]))))
i1 <- unlist(lst)
newdata$Station[i1] <- rep(na.omit(newdata$Tow), lengths(lst))
newdata
# Tow.y Tow Station
#1 21 NA 21
#2 NA NA 21
#3 NA NA 21
#4 NA 21 21
#5 NA NA NA
#6 NA NA NA
#7 22 NA 22
#8 NA NA 22
#9 NA 22 22
#10 NA NA NA
#11 NA NA NA
#12 NA NA NA
#13 23 NA 23
#14 NA NA 23
#15 NA 23 23
Or using the same logic with tidyverse
library(tidyverse)
newdata %>%
mutate_all(funs(row_number() * !is.na(.))) %>%
map( ~ .x[.x!=0]) %>%
transpose %>%
map(reduce, `:`) %>%
set_names(na.omit(newdata$Tow)) %>%
stack %>%
right_join(newdata %>% mutate(values = row_number())) %>%
rename(Station = ind) %>%
ungroup %>%
select(names(newdata), everything(), -values)
# Tow.y Tow Station
#1 21 NA 21
#2 NA NA 21
#3 NA NA 21
#4 NA 21 21
#5 NA NA <NA>
#6 NA NA <NA>
#7 22 NA 22
#8 NA NA 22
#9 NA 22 22
#10 NA NA <NA>
#11 NA NA <NA>
#12 NA NA <NA>
#13 23 NA 23
#14 NA NA 23
#15 NA 23 23
Upvotes: 1
Reputation: 145755
Here's a good way. I left the helper columns in to demonstrate how it works, but you can easily remove them with a select
.
newdata1 %>%
mutate(from_first = zoo::na.locf(Station, na.rm = FALSE),
from_last = zoo::na.locf(Station, na.rm = FALSE, fromLast = TRUE),
result = if_else(from_first == from_last, from_first, Station))
# Tow.y Tow Station from_first from_last result
# 1 21 NA 21 21 21 21
# 2 NA NA NA 21 21 21
# 3 NA NA NA 21 21 21
# 4 NA 21 21 21 21 21
# 5 NA NA NA 21 22 NA
# 6 NA NA NA 21 22 NA
# 7 22 NA 22 22 22 22
# 8 NA NA NA 22 22 22
# 9 NA 22 22 22 22 22
# 10 NA NA NA 22 23 NA
# 11 NA NA NA 22 23 NA
# 12 NA NA NA 22 23 NA
# 13 23 NA 23 23 23 23
# 14 NA NA NA 23 23 23
# 15 NA 23 23 23 23 23
Upvotes: 4