Reputation: 95
I have a dataset like the generic dataset below and wish to join the three sitevariables into one.
df <- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
var2 = c(NA, NA, "NA", "site2", NA, "site4"),
var3 = c("site1", NA, "Site2", "site2", "site3", NA))
var1 var2 var3
1 site1 <NA> site1
2 <NA> <NA> <NA>
3 <NA> NA Site2
4 <NA> site2 site2
5 site3 <NA> site3
6 site4 site4 <NA>
The data frame I wish to obtain should thus be like this:
var1 var2 var3 var4
1 site1 <NA> site1 site1
2 <NA> <NA> <NA> <NA>
3 <NA> NA Site2 site2
4 <NA> site2 site2 site2
5 site3 <NA> site3 site3
6 site4 site4 <NA> site4
I have tried using coalesce in the sqldf package, which gives the correct output, but str() reveals that the output is a data frame with one variable. and not a string of factors as I need.
library(sqldf)
df$var4 <-sqldf("select coalesce(var1, var2, var3) from df")
> df
var1 var2 var3 coalesce(var1, var2, var3)
1 site1 <NA> site1 site1
2 <NA> <NA> <NA> <NA>
3 <NA> NA Site2 NA
4 <NA> site2 site2 site2
5 site3 <NA> site3 site3
6 site4 site4 <NA> site4
> str(df)
'data.frame': 6 obs. of 4 variables:
$ var1: Factor w/ 3 levels "site1","site3",..: 1 NA NA NA 2 3
$ var2: Factor w/ 3 levels "NA","site2","site4": NA NA 1 2 NA 3
$ var3: Factor w/ 4 levels "site1","site2",..: 1 NA 3 2 4 NA
$ var4:'data.frame': 6 obs. of 1 variable:
..$ coalesce(var1, var2, var3): chr "site1" NA "NA" "site2" ...
Warning message:
Upvotes: 0
Views: 65
Reputation: 887118
We can use max.col
df$var4 <- df[cbind(1:nrow(df), max.col(!is.na(df), "first"))]
df
# var1 var2 var3 var4
#1 site1 <NA> site1 site1
#2 <NA> <NA> <NA> <NA>
#3 <NA> <NA> Site2 Site2
#4 <NA> site2 site2 site2
#5 site3 <NA> site3 site3
#6 site4 site4 <NA> site4
Upvotes: 1
Reputation: 10483
Another option is to use something like this:
df$var4 <- apply(df, 1, min, na.rm = TRUE)
Output is as follows:
var1 var2 var3 var4
1 site1 <NA> site1 site1
2 <NA> <NA> <NA> <NA>
3 <NA> <NA> Site2 Site2
4 <NA> site2 site2 site2
5 site3 <NA> site3 site3
6 site4 site4 <NA> site4
Upvotes: 2
Reputation: 57210
Maybe this might be a possible solution :
df<- data.frame(var1 = c("site1", NA, NA, NA, "site3", "site4"),
var2 = c(NA, NA, "NA", "site2", NA, "site4"),
var3 = c("site1", NA, "Site2", "site2", "site3", NA))
getLastNonNA <- function(v){
notNAs <- !is.na(v)
if(!any(notNAs))
return(NA)
tail(v[notNAs],1)
}
# if you prefer the first non-NA
#getFirstNonNA <- function(v){
# notNAs <- !is.na(v)
# if(!any(notNAs))
# return(NA)
# head(v[notNAs],1)
#}
df$var4 <- apply(df,1,getLastNonNA)
> df
var1 var2 var3 var4
1 site1 <NA> site1 site1
2 <NA> <NA> <NA> <NA>
3 <NA> NA Site2 Site2
4 <NA> site2 site2 site2
5 site3 <NA> site3 site3
6 site4 site4 <NA> site4
Note that the last column is a characters vector, but you can easily convert to factor if you want (using as.factor
).
Upvotes: 1
Reputation: 51592
You can use na.locf
from zoo
package to get var4,
library(zoo)
df$var4 <- na.locf(t(df))[ncol(df),]
df
# var1 var2 var3 var4
#1 site1 <NA> site1 site1
#2 <NA> <NA> <NA> <NA>
#3 <NA> NA Site2 Site2
#4 <NA> site2 site2 site2
#5 site3 <NA> site3 site3
#6 site4 site4 <NA> site4
Upvotes: 2
Reputation: 26
You can use cbind() to merge the two data frame like:
tmp = sqldf("select coalesce(var1, var2, var3) from df")
df = cbind(df, tmp)
then change the colname by
colnames(df)[4] = 'var4'
Upvotes: 1