jeroen81
jeroen81

Reputation: 2425

Add missing value in column with value from row above

Every week I a incomplete dataset for a analysis. That looks like:

df1 <- data.frame(var1 = c("a","","","b",""), 
             var2 = c("x","y","z","x","z"))

Some var1 values are missing. The dataset should end up looking like this:

df2 <- data.frame(var1 = c("a","a","a","b","b"), 
             var2 = c("x","y","z","x","z"))

Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.

Thanks for your help.

QUESTION UPDATE AFTER COMMENT

var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.

df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))

Upvotes: 24

Views: 13973

Answers (5)

jeroen81
jeroen81

Reputation: 2425

The tidyr packages has the fill() function which does the trick.

df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)

Upvotes: 14

ice024
ice024

Reputation: 36

Below is my unfill function, encontered same problem, hope will help.

unfill <- function(df,cols){
  col_names <- names(df)
  unchanged <- df[!(names(df) %in% cols)]
  changed <- df[names(df) %in% cols] %>%
    map_df(function(col){
      col[col == col %>% lag()] <- NA
      col
    })
  unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}

Upvotes: 1

Sacha Epskamp
Sacha Epskamp

Reputation: 47632

Here is another way which is slightly shorter and doesn't coerce to character:

Fill <- function(x,missing="")
{
  Log <- x != missing
  y <- x[Log]
  y[cumsum(Log)]
}

Results:

# For factor:
Fill(df1$var1)
[1] a a a b b
Levels:  a b

# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"

Upvotes: 7

Andrei
Andrei

Reputation: 2665

Here is a simpler way:

library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)

Upvotes: 20

Andrie
Andrie

Reputation: 179558

Here is one way of doing it by making use of run-length encoding (rle) and its inverse rle.inverse:

fillTheBlanks <- function(x, missing=""){
  rle <- rle(as.character(x))
  empty <- which(rle$value==missing)
  rle$values[empty] <- rle$value[empty-1] 
  inverse.rle(rle)
}

df1$var1 <- fillTheBlanks(df1$var1)

The results:

df1

  var1 var2
1    a    x
2    a    y
3    a    z
4    b    x
5    b    z

Upvotes: 25

Related Questions