Reputation: 1532
I have a data frame with two columns "a" and "b" with alternating missing values (NA
)
a b
dog <NA>
mouse <NA>
<NA> cat
bird <NA>
I want to "merge" / combine them to a new column c that looks like this, i.e. the non-NA
element in each row is selected:
c
dog
mouse
cat
bird
I tried merge
and join
, but neither worked as I wanted. Maybe because I do not have an id with which to merge? For integers I would just circumvent this and add both columns, but how in my case?
Upvotes: 15
Views: 12712
Reputation: 444
Use tidyr::unite
to be safe in case of a row containing two values:
df <- df |>
unite(c,
c(a, b),
remove = FALSE,
na.rm = TRUE)
Upvotes: 1
Reputation: 181
dplyr
has exactly what you are looking for, function coalesce()
library(dplyr)
a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)
coalesce(a,b)
[1] "dog" "mouse" "cat" "bird"
Upvotes: 12
Reputation: 5193
Using if else logic:
a<-c("dog","mouse",NA,"bird")
b<-c(NA,NA,"cat",NA)
test.df <-data.frame(a,b, stringsAsFactors = FALSE)
test.df$c <- ifelse(is.na(test.df$a), test.df$b, test.df$a)
test.df
a b c
1 dog <NA> dog
2 mouse <NA> mouse
3 <NA> cat cat
4 bird <NA> bird
Upvotes: 2
Reputation: 37889
You could use a simple apply
:
df$c <- apply(df,1,function(x) x[!is.na(x)] )
> df
a b c
1 dog <NA> dog
2 mouse <NA> mouse
3 <NA> cat cat
4 bird <NA> bird
Upvotes: 2
Reputation: 67818
You may try pmax
df$c <- pmax(df$a, df$b)
df
# a b c
# 1 dog <NA> dog
# 2 mouse <NA> mouse
# 3 <NA> cat cat
# 4 bird <NA> bird
...or ifelse
:
df$c <- ifelse(is.na(df$a), df$b, df$a)
For more general solutions in cases with more than two columns, you find several ways to implement coalesce in R here.
Upvotes: 12
Reputation: 887721
Another option is to use which
with arr.ind=TRUE
indx <- which(!is.na(df), arr.ind=TRUE)
df$c <- df[indx][order(indx[,1])]
df
# a b c
#1 dog <NA> dog
#2 mouse <NA> mouse
#3 <NA> cat cat
#4 bird <NA> bird
Or
df$c <- df[cbind(1:nrow(df),max.col(!is.na(df)))]
Upvotes: 5
Reputation: 92300
Here's my attempt (modified by @MrFlick)
df$c <- apply(df, 1, function(x) na.omit(x)[1])
df
# a b c
# 1 dog <NA> dog
# 2 mouse <NA> mouse
# 3 <NA> cat cat
# 4 bird <NA> bird
Upvotes: 5
Reputation: 206496
I wrote a coalesce() function for this type of task which works much like the SQL coalesce function. You would use it like
dd<-read.table(text="a b
dog NA
mouse NA
NA cat
bird NA", header=T)
dd$c <- with(dd, coalesce(a,b))
dd
# a b c
# 1 dog <NA> dog
# 2 mouse <NA> mouse
# 3 <NA> cat cat
# 4 bird <NA> bird
Upvotes: 9