Reputation: 2425
Every week I a incomplete dataset for a analysis. That looks like:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
Some var1 values are missing. The dataset should end up looking like this:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.
Thanks for your help.
QUESTION UPDATE AFTER COMMENT
var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
Upvotes: 24
Views: 13973
Reputation: 2425
The tidyr packages has the fill()
function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
Upvotes: 14
Reputation: 36
Below is my unfill function, encontered same problem, hope will help.
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}
Upvotes: 1
Reputation: 47632
Here is another way which is slightly shorter and doesn't coerce to character:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
Results:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
Upvotes: 7
Reputation: 2665
Here is a simpler way:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
Upvotes: 20
Reputation: 179558
Here is one way of doing it by making use of run-length encoding (rle
) and its inverse rle.inverse
:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
The results:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
Upvotes: 25