find unique strings in data frame variables

Question

I have a data frame with several character variables, I want to find the unique string in each row. There is only a certain string duplicated in several columns per row, surrounded by NAs. I.E. the data frame "df":

  Col1 Col2 Col3
1 ABC  ABC  NA
2  NA  DEF  DEF
3 GHI  NA   NA
4 JKL  JKL  JKL

As an output I would like to have

ABC
DEF
GHI
JKL

Best would be to have some kind of apply function for each row. I tried out several variations of

apply(df,1, function(x) unique(x))

But that was not successful. I think there is quite an easy way, if you know the correct function? How can I do that?

akrun · Accepted Answer

We can use is.na to remove the NA elements

unname(apply(df, 1, FUN = function(x) unique(x[!is.na(x)])))
#[1] "ABC" "DEF" "GHI" "JKL"

If there are more than one unique element per row, it will return as a list (depending upon whether the number of elements are different for each row). In that case, we can paste them together to create a single string

unname(apply(df, 1, FUN = function(x) toString(unique(x[!is.na(x)]))))

Another option is pmax if there is only a single unique element per row

 do.call(pmax, c(df, list(na.rm=TRUE)))
 #[1] "ABC" "DEF" "GHI" "JKL"

find unique strings in data frame variables

Answers (2)

Related Questions