James Martherus
James Martherus

Reputation: 1043

How to fill missing values with multiple columns in R

I have a dataset of movies with several columns listing actors/actresses appearing in the movie. The data is messy and sometimes the first column contains a missing value but the second contains an actor's name. I want to keep all the actor columns but move each non-missing value to the earliest column. For example:

movies <- data.frame(actor1=c("A","B",NA,"C",NA), actor2=c(NA, "Z", "W", NA, "X"), actor3=c("L","M","N","O","P"))

  actor1 actor2 actor3
1      A   <NA>      L
2      B      Z      M
3   <NA>      W      N
4      C   <NA>      O
5   <NA>      X      P

Should become:

  actor1 actor2 actor3
1      A      L   <NA>
2      B      Z      M
3      W      N   <NA>
4      C      O   <NA>
5      X      P   <NA>

coalesce() will pull W and X to the first column. Perfect. But how do I do the same for subsequent columns? For example, since W was pulled from actor2 to actor1, I now want the third row of actor2 to have the value N, not W.

Upvotes: 3

Views: 751

Answers (2)

akrun
akrun

Reputation: 887911

An option is to use apply with MARGIN=1 to loop over the rows, concatenate (c) the non-NA elements followed by the NA elements

movies[] <- t(apply(movies, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
movies
# actor1 actor2 actor3
#1      A      L   <NA>
#2      B      Z      M
#3      W      N   <NA>
#4      C      O   <NA>
#5      X      P   <NA>

Also, if it is a subset of columns, then use startsWith

i1 <- startsWith(names(movies), "actor")

and update only those columns

movies[i1] <-  t(apply(movies[i1], 1, function(x) c(x[!is.na(x)], x[is.na(x)])))

Upvotes: 1

jay.sf
jay.sf

Reputation: 73742

A way identifying the actors* columns first (probably there are columns of different kind). Basically this rips out the NAs off the rows and harmonizes the lengths afterwards.

ac.cols <- grep("^actor\\d$", names(movies), value=TRUE)
movies[ac.cols] <- lapply(movies[ac.cols], as.character)

res <- setNames(do.call(rbind.data.frame, 
                        lapply(1:nrow(movies), function(m) 
                          `length<-`(
                            na.omit(unlist(movies[m, ac.cols])),
                            ncol(movies)))), 
                ac.cols)
res
#   actor1 actor2 actor3
# 1      A      L   <NA>
# 2      B      Z      M
# 3      W      N   <NA>
# 4      C      O   <NA>
# 5      X      P   <NA>

Upvotes: 1

Related Questions