Reputation: 1043
I have a dataset of movies with several columns listing actors/actresses appearing in the movie. The data is messy and sometimes the first column contains a missing value but the second contains an actor's name. I want to keep all the actor columns but move each non-missing value to the earliest column. For example:
movies <- data.frame(actor1=c("A","B",NA,"C",NA), actor2=c(NA, "Z", "W", NA, "X"), actor3=c("L","M","N","O","P"))
actor1 actor2 actor3
1 A <NA> L
2 B Z M
3 <NA> W N
4 C <NA> O
5 <NA> X P
Should become:
actor1 actor2 actor3
1 A L <NA>
2 B Z M
3 W N <NA>
4 C O <NA>
5 X P <NA>
coalesce()
will pull W and X to the first column. Perfect. But how do I do the same for subsequent columns? For example, since W was pulled from actor2 to actor1, I now want the third row of actor2 to have the value N, not W.
Upvotes: 3
Views: 751
Reputation: 887911
An option is to use apply
with MARGIN=1
to loop over the rows, concatenate (c
) the non-NA elements followed by the NA
elements
movies[] <- t(apply(movies, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
movies
# actor1 actor2 actor3
#1 A L <NA>
#2 B Z M
#3 W N <NA>
#4 C O <NA>
#5 X P <NA>
Also, if it is a subset of columns, then use startsWith
i1 <- startsWith(names(movies), "actor")
and update only those columns
movies[i1] <- t(apply(movies[i1], 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
Upvotes: 1
Reputation: 73742
A way identifying the actors*
columns first (probably there are columns of different kind). Basically this rips out the NA
s off the rows and harmonizes the length
s afterwards.
ac.cols <- grep("^actor\\d$", names(movies), value=TRUE)
movies[ac.cols] <- lapply(movies[ac.cols], as.character)
res <- setNames(do.call(rbind.data.frame,
lapply(1:nrow(movies), function(m)
`length<-`(
na.omit(unlist(movies[m, ac.cols])),
ncol(movies)))),
ac.cols)
res
# actor1 actor2 actor3
# 1 A L <NA>
# 2 B Z M
# 3 W N <NA>
# 4 C O <NA>
# 5 X P <NA>
Upvotes: 1