fill in NAs in dataframe between values

Question

I have an example dataset

newdata<-data.frame(Tow.y=c(21,"NA","NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23,"NA","NA"),Tow=c("NA","NA","NA",21,"NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23))
newdata$Tow.y<-as.numeric(as.character(newdata$Tow.y))
newdata$Tow<-as.numeric(as.character(newdata$Tow))

newdata1<-newdata %>% 
  mutate(Station = coalesce(Tow.y, Tow))
newdata1

The resulting code produces:

Tow.y Tow Station
1     21  NA      21
2     NA  NA      NA
3     NA  NA      NA
4     NA  21      21
5     NA  NA      NA
6     NA  NA      NA
7     22  NA      22
8     NA  NA      NA
9     NA  22      22
10    NA  NA      NA
11    NA  NA      NA
12    NA  NA      NA
13    23  NA      23
14    NA  NA      NA
15    NA  23      23

I would like to fill in NAs for NAs between unique values in Station. So NAs in between the two 21 values would be 21, the NAs in between the 22s would be 22, etc. The NAs in between consecutive numbers would remain NAs.

Like this:

  Tow.y Tow Station
1     21  NA      21
2     NA  NA      21
3     NA  NA      21
4     NA  21      21
5     NA  NA      NA
6     NA  NA      NA
7     22  NA      22
8     NA  NA      22
9     NA  22      22
10    NA  NA      NA
11    NA  NA      NA
12    NA  NA      NA
13    23  NA      23
14    NA  NA      23
15    NA  23      23

I have tried the na.locf function in the zoo package, but that replaces all NA values.

newdata1$Station2<-na.locf(newdata1$Station,na.rm = F)

Other examples I have looked at show that you can use na.locf with a group variable, but I dont have a grouping variable that is complete for the data set. Does anyone have a method for filling in the NAs where I need them to be filled in.

Gregor Thomas · Accepted Answer

Here's a good way. I left the helper columns in to demonstrate how it works, but you can easily remove them with a select.

newdata1 %>%
  mutate(from_first = zoo::na.locf(Station, na.rm = FALSE),
         from_last = zoo::na.locf(Station, na.rm = FALSE, fromLast = TRUE),
         result = if_else(from_first == from_last, from_first, Station))
#    Tow.y Tow Station from_first from_last result
# 1     21  NA      21         21        21     21
# 2     NA  NA      NA         21        21     21
# 3     NA  NA      NA         21        21     21
# 4     NA  21      21         21        21     21
# 5     NA  NA      NA         21        22     NA
# 6     NA  NA      NA         21        22     NA
# 7     22  NA      22         22        22     22
# 8     NA  NA      NA         22        22     22
# 9     NA  22      22         22        22     22
# 10    NA  NA      NA         22        23     NA
# 11    NA  NA      NA         22        23     NA
# 12    NA  NA      NA         22        23     NA
# 13    23  NA      23         23        23     23
# 14    NA  NA      NA         23        23     23
# 15    NA  23      23         23        23     23

fill in NAs in dataframe between values

Answers (2)

Related Questions