How to fill missing values with multiple columns in R

Question

I have a dataset of movies with several columns listing actors/actresses appearing in the movie. The data is messy and sometimes the first column contains a missing value but the second contains an actor's name. I want to keep all the actor columns but move each non-missing value to the earliest column. For example:

movies <- data.frame(actor1=c("A","B",NA,"C",NA), actor2=c(NA, "Z", "W", NA, "X"), actor3=c("L","M","N","O","P"))

  actor1 actor2 actor3
1      A         L
2      B      Z      M
3         W      N
4      C         O
5         X      P

Should become:

  actor1 actor2 actor3
1      A      L   
2      B      Z      M
3      W      N   
4      C      O   
5      X      P

coalesce() will pull W and X to the first column. Perfect. But how do I do the same for subsequent columns? For example, since W was pulled from actor2 to actor1, I now want the third row of actor2 to have the value N, not W.

akrun · Accepted Answer

An option is to use apply with MARGIN=1 to loop over the rows, concatenate (c) the non-NA elements followed by the NA elements

movies[] <- t(apply(movies, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
movies
# actor1 actor2 actor3
#1      A      L   
#2      B      Z      M
#3      W      N   
#4      C      O   
#5      X      P

Also, if it is a subset of columns, then use startsWith

i1 <- startsWith(names(movies), "actor")

and update only those columns

movies[i1] <-  t(apply(movies[i1], 1, function(x) c(x[!is.na(x)], x[is.na(x)])))

How to fill missing values with multiple columns in R

Answers (2)

Related Questions