Thoegernh
Thoegernh

Reputation: 95

Joining three factor columns in one data frame

I have a dataset like the generic dataset below and wish to join the three sitevariables into one.

 df <- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
                  var2 = c(NA, NA, "NA", "site2", NA, "site4"),
                  var3 = c("site1", NA, "Site2", "site2", "site3", NA))

   var1  var2  var3
1 site1  <NA> site1
2  <NA>  <NA>  <NA>
3  <NA>    NA Site2
4  <NA> site2 site2
5 site3  <NA> site3
6 site4 site4  <NA>

The data frame I wish to obtain should thus be like this:

       var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>    NA Site2 site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4

I have tried using coalesce in the sqldf package, which gives the correct output, but str() reveals that the output is a data frame with one variable. and not a string of factors as I need.

library(sqldf)
df$var4 <-sqldf("select coalesce(var1, var2, var3) from df")
> df
   var1  var2  var3 coalesce(var1, var2, var3)
1 site1  <NA> site1                      site1
2  <NA>  <NA>  <NA>                       <NA>
3  <NA>    NA Site2                         NA
4  <NA> site2 site2                      site2
5 site3  <NA> site3                      site3
6 site4 site4  <NA>                      site4
> str(df)
'data.frame':   6 obs. of  4 variables:
 $ var1: Factor w/ 3 levels "site1","site3",..: 1 NA NA NA 2 3
 $ var2: Factor w/ 3 levels "NA","site2","site4": NA NA 1 2 NA 3
 $ var3: Factor w/ 4 levels "site1","site2",..: 1 NA 3 2 4 NA
 $ var4:'data.frame':   6 obs. of  1 variable:
  ..$ coalesce(var1, var2, var3): chr  "site1" NA "NA" "site2" ...
Warning message:

Upvotes: 0

Views: 65

Answers (5)

akrun
akrun

Reputation: 887118

We can use max.col

df$var4 <- df[cbind(1:nrow(df), max.col(!is.na(df), "first"))]
df
#   var1  var2  var3  var4
#1 site1  <NA> site1 site1
#2  <NA>  <NA>  <NA>  <NA>
#3  <NA>  <NA> Site2 Site2
#4  <NA> site2 site2 site2
#5 site3  <NA> site3 site3
#6 site4 site4  <NA> site4

Upvotes: 1

Gopala
Gopala

Reputation: 10483

Another option is to use something like this:

df$var4 <- apply(df, 1, min, na.rm = TRUE)

Output is as follows:

   var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>  <NA> Site2 Site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4

Upvotes: 2

digEmAll
digEmAll

Reputation: 57210

Maybe this might be a possible solution :

df<- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
                var2 = c(NA, NA, "NA", "site2", NA, "site4"),
                var3 = c("site1", NA, "Site2", "site2", "site3", NA))

getLastNonNA <- function(v){
  notNAs <- !is.na(v)
  if(!any(notNAs))
    return(NA)
  tail(v[notNAs],1)
}

# if you prefer the first non-NA
#getFirstNonNA <- function(v){
#  notNAs <- !is.na(v)
#  if(!any(notNAs))
#    return(NA)
#  head(v[notNAs],1)
#}

df$var4 <- apply(df,1,getLastNonNA)

> df
   var1  var2  var3  var4
1 site1  <NA> site1 site1
2  <NA>  <NA>  <NA>  <NA>
3  <NA>    NA Site2 Site2
4  <NA> site2 site2 site2
5 site3  <NA> site3 site3
6 site4 site4  <NA> site4

Note that the last column is a characters vector, but you can easily convert to factor if you want (using as.factor).

Upvotes: 1

Sotos
Sotos

Reputation: 51592

You can use na.locf from zoo package to get var4,

library(zoo)
df$var4 <- na.locf(t(df))[ncol(df),]
df
#   var1  var2  var3  var4
#1 site1  <NA> site1 site1
#2  <NA>  <NA>  <NA>  <NA>
#3  <NA>    NA Site2 Site2
#4  <NA> site2 site2 site2
#5 site3  <NA> site3 site3
#6 site4 site4  <NA> site4

Upvotes: 2

Yang
Yang

Reputation: 26

You can use cbind() to merge the two data frame like:

tmp = sqldf("select coalesce(var1, var2, var3) from df")
df = cbind(df, tmp)

then change the colname by

colnames(df)[4] = 'var4'

Upvotes: 1

Related Questions