Mikeed
Mikeed

Reputation: 1

Selecting unique values from single column of a data frame

I have a data frame consisting of five character variables which represent specific bacteria. I then have thousands of observations of each variable that all begin with the letter K. eg

    x <- c(K0001,K0001,K0003,K0006)
    y <- c(K0001,K0001,K0002,K0003) 
    z <- c(K0001,K0002,K0007,K0008)
    r <- c(K0001,K0001,K0001,K0001)
    o <- c(K0003,K0009,K0009,K0009)

I need to identify unique observations in the first column that don't appear in any of the remaining four columns. I have tried the approach suggested here which I think would work if I could create individual vectors using select ...

How to tell what is in one vector and not another?

but when I try to create a vector for analysis using the code ...

x <- select(data$x)

I get the error

Error in UseMethod("select_") : no applicable method for 'select_' applied to an object of class "character

I have tried to mutate the vectors using as.factor and as.numeric but neither of these approaches work as the first gives an equivalent error as above, and as.numeric returns NAs.

Thanks in advance

Upvotes: 0

Views: 160

Answers (1)

G5W
G5W

Reputation: 37661

The reference that you cited recommended using setdiff. The only thing that you need to do to apply that solution is to convert the four columns into one, so that it can be treated as a set. You can do that with unlist

setdiff(data$x, unlist(data[,2:5]))
"K0006"

Upvotes: 1

Related Questions