Rockbar
Rockbar

Reputation: 1161

find unique strings in data frame variables

I have a data frame with several character variables, I want to find the unique string in each row. There is only a certain string duplicated in several columns per row, surrounded by NAs. I.E. the data frame "df":

  Col1 Col2 Col3
1 ABC  ABC  NA
2  NA  DEF  DEF
3 GHI  NA   NA
4 JKL  JKL  JKL

As an output I would like to have

ABC
DEF
GHI
JKL

Best would be to have some kind of apply function for each row. I tried out several variations of

apply(df,1, function(x) unique(x))

But that was not successful. I think there is quite an easy way, if you know the correct function? How can I do that?

Upvotes: 3

Views: 3059

Answers (2)

akrun
akrun

Reputation: 887108

We can use is.na to remove the NA elements

unname(apply(df, 1, FUN = function(x) unique(x[!is.na(x)])))
#[1] "ABC" "DEF" "GHI" "JKL"

If there are more than one unique element per row, it will return as a list (depending upon whether the number of elements are different for each row). In that case, we can paste them together to create a single string

unname(apply(df, 1, FUN = function(x) toString(unique(x[!is.na(x)])))) 

Another option is pmax if there is only a single unique element per row

 do.call(pmax, c(df, list(na.rm=TRUE)))
 #[1] "ABC" "DEF" "GHI" "JKL"

Upvotes: 3

user2100721
user2100721

Reputation: 3587

Another option

levels(unlist(df))

Upvotes: 1

Related Questions