niceguy
niceguy

Reputation: 157

Replace 0 values in dataframe with nearest non-0 value

I'm sourcing some data where sometimes I get consecutive 0 values like 5+ times (can change, not constant). For my purposes, it would be great if I can replace these values with the nearest non-0 value.

n = n %>% mutate(impliedearningsmov = if_else(impliedearningsmov==0,
                                                if_else(lead(impliedearningsmov)==0,
                                                        if_else(lead(impliedearningsmov,2)==0,
                                                                if_else(lead(impliedearningsmov,3)==0,
                                                                        if_else(lead(impliedearningsmov,4)==0,
                                                                                if_else(lead(impliedearningsmov,5)==0,
                                                                                        if_else(lead(impliedearningsmov,6)==0,
                                                                                                lead(impliedearningsmov,7),
                                                                                                lead(impliedearningsmov,6)),
                                                                                        lead(impliedearningsmov,5)),
                                                                                lead(impliedearningsmov,4)),
                                                                        lead(impliedearningsmov,3)),
                                                                lead(impliedearningsmov,2)),
                                                        lead(impliedearningsmov)),
                                                impliedearningsmov))

My solution seems very redundant, it doesn't matter if I use the lead or lagged value. I know how to do the mean or median, but I need the nearest non 0 value. Thanks!

An example of the data would just be a numeric vector.

3.84                
3.54                
3.94                
3.72                
0.00                
0.00                
0.00                
5.88                
3.26                
4.37    

Upvotes: 1

Views: 417

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Replace 0 with NA then you can either use tidyr::fill

library(dplyr)
library(tidyr)

df %>%
  mutate(a = replace(a, a == 0, NA)) %>%
  fill(a)

Or zoo::na.locf

df %>%
  mutate(a = replace(a, a == 0, NA),
         a = zoo::na.locf(a))

#      a
#   <dbl>
# 1     1
# 2     2
# 3     3
# 4     4
# 5     5
# 6     5
# 7     5
# 8     5
# 9     1
#10     2
# … with 12 more rows

Upvotes: 2

Baraliuh
Baraliuh

Reputation: 2141

You could do it like this, first define a function that does the trick like this:

zero_replacer <- function(x) {
    #Find zero values
    is_zero <- which(x == 0)
    #Find stretches of zero values
    zero_stretches <- cumsum(!dplyr::lag(is_zero, default = is_zero[1] - 1) + 1 == is_zero) + 1
    #Replace each stretch with prev value
    for (i in unique(zero_stretches )) {
        idx_stretch <- zero_stretches == i
        idx_zero <- is_zero[idx_stretch]
        idx_val <- idx_zero[1] - 1
        x[idx_zero] <- x[idx_val]
    }
    x
}

Then you can use this function in mutate like this:

tibble::tibble(# Generate some dummy data
    a = c(1:5, 0, 0, 0, 1:4, rep(0, times = 10))
) %>% 
    dplyr::mutate(
        #Replace zero with prev non-zero
        a = zero_replacer(a)
    )

Output:

# A tibble: 22 x 1
       a
   <dbl>
 1     1
 2     2
 3     3
 4     4
 5     5
 6     5
 7     5
 8     5
 9     1
10     2
11     3
12     4
13     4
14     4
15     4
16     4
17     4
18     4
19     4
20     4
21     4
22     4

Upvotes: 1

Related Questions