Reputation: 1161
I have a data frame with several character variables, I want to find the unique string in each row. There is only a certain string duplicated in several columns per row, surrounded by NAs. I.E. the data frame "df":
Col1 Col2 Col3
1 ABC ABC NA
2 NA DEF DEF
3 GHI NA NA
4 JKL JKL JKL
As an output I would like to have
ABC
DEF
GHI
JKL
Best would be to have some kind of apply function for each row. I tried out several variations of
apply(df,1, function(x) unique(x))
But that was not successful. I think there is quite an easy way, if you know the correct function? How can I do that?
Upvotes: 3
Views: 3059
Reputation: 887108
We can use is.na
to remove the NA elements
unname(apply(df, 1, FUN = function(x) unique(x[!is.na(x)])))
#[1] "ABC" "DEF" "GHI" "JKL"
If there are more than one unique
element per row, it will return as a list
(depending upon whether the number of elements are different for each row). In that case, we can paste
them together to create a single string
unname(apply(df, 1, FUN = function(x) toString(unique(x[!is.na(x)]))))
Another option is pmax
if there is only a single unique element per row
do.call(pmax, c(df, list(na.rm=TRUE)))
#[1] "ABC" "DEF" "GHI" "JKL"
Upvotes: 3