Reputation: 1089

Fill NAs in column with either 0 or value n above repeated n times

I have a data frame that looks like:

df <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
                 y = c(NA, 2, NA, NA, NA, 3, NA, NA, NA, 1, NA, NA))

I want it to look like this:

data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
           y = c(0, 2, 2, 0, 0, 3, 3, 3, 0, 1, 0, 0))

#>     x y
#> 1   1 0
#> 2   2 2
#> 3   3 2
#> 4   4 0
#> 5   5 0
#> 6   6 3
#> 7   7 3
#> 8   8 3
#> 9   9 0
#> 10 10 1
#> 11 11 0
#> 12 12 0

I have solved with a while-loop, but was looking for a more R-like solution.

This is the loop solution:

df[is.na(df)] <- 0 # replace all NA with 0

i = 1
while (i < nrow(df)){
  
  if (df$y[i] < 2){ # do nothing if y = 1
    i = i+1
  } else {
    df$y[(i+1):(i+df$y[i]-1)] <- df$y[i]
    i = i+df$y[i]
  }
}

Bonus question: could it be done within a pipe and for multiple columns (e.g. a column z = c(1, NA, NA, NA, 4, NA, NA, NA, NA, 2, NA, NA))?

Upvotes: 3

Answers (3)

jkatam

Reputation: 3447

Alternatively, please try below code without any custom function

df2 <- df %>% mutate(z=y) %>% fill(z) %>% group_by(y,z) %>%
 mutate(row=row_number()+1, y=ifelse(z>=row,z,y)) %>% ungroup() %>% 
select(-z,-row)

Upvotes: 0

G. Grothendieck

Reputation: 269694

Group the rows so that each non-NA starts a new group and then for each such group if the first element is NA then output 0's and otherwise output the first element that many times followed by 0's. This uses base R only but if you prefer dplyr replace transform with mutate and all else stays the same.

f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
transform(df, y = ave(y, cumsum(!is.na(y)), FUN = f))

giving (continued below)

If there were several columns then if ix contains the column numbers to be processed or the column names then using the same f as above then run it over each column to be transformed.

ix <- "y"
f <- function(x) if (is.na(x[1])) 0 else ifelse(seq_along(x) > x[1], 0, x[1])
f2 <- function(i) ave(df[[i]], cumsum(!is.na(df[[i]])), FUN = f)
replace(df, ix, lapply(ix, f2))

Upvotes: 0

Maël

Reputation: 52069

You can create an empty vector with numeric, get the value with complete.cases and rep, and get the indices with complete.cases and sequence:

fill_n_repeat <- function(x){
  value = x[complete.cases(x)]
  idx = which(complete.cases(x))
  v = numeric(length(x)) 
  v[sequence(value, idx)] <- rep(value, value)
  v
}

library(dplyr)
df %>% 
  mutate(across(y:z, fill_n_repeat))
    x y z
1   1 0 1
2   2 2 0
3   3 2 0
4   4 0 0
5   5 0 4
6   6 3 4
7   7 3 4
8   8 3 4
9   9 0 0
10 10 1 2
11 11 0 2
12 12 0 0

Upvotes: 2

Fill NAs in column with either 0 or value n above repeated n times

Answers (3)

Related Questions