Mike
Mike

Reputation: 1069

R: pulling only entries which are unique from a dataframe

I'm looking to pull only those entries which occur exactly once within a dataframe. As an example:

DataFrame1  
Col1 Col2   
ABC   5  
DEF   6  
DEF   7  
HIJ   8

I would like to pull only:

DataFrame2  
ABC  
HIJ

Where the uniqueness is determined only by Col1.

Any ideas?

Thanks!

Upvotes: 0

Views: 127

Answers (4)

jac
jac

Reputation: 630

How about:

#If you want just the unique 
DataFrame1[which(table(DataFrame1[,"Col1"])==1),"Col1"]values.

#If you want the whole corresponding row.
DataFrame1[which(table(DataFrame1[,"Col1"])==1),]

Upvotes: 0

Jonas Tundo
Jonas Tundo

Reputation: 6197

It's a bit cumbersome, but this works:

x <- table(DataFrame1[, 1]) == 1
DataFrame2 <- na.omit(data.frame(ifelse(x, names(x),NA)))

Or more elegantly with sql:

library(sqldf)

DataFrame2 <- sqldf('select Col1 from DataFrame1 group by Col1 having count(Col1) = 1')

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use ave to create a vector of counts of the values in col1 and subset based on that:

mydf[with(mydf, ave(Col1, Col1, FUN = length)) == "1", ]
#   Col1 Col2
# 1  ABC    5
# 4  HIJ    8

Or, similarly, with "data.table":

library(data.table)
DT <- data.table(mydf)
DT[, id := .N, by = Col1][id == 1]
#    Col1 Col2 id
# 1:  ABC    5  1
# 2:  HIJ    8  1

Duplicated also works, if you run it twice, once from each direction:

mydf[!(duplicated(mydf$Col1) | duplicated(mydf$Col1, fromLast=TRUE)), ]
#   Col1 Col2
# 1  ABC    5
# 4  HIJ    8

Upvotes: 1

llrs
llrs

Reputation: 3397

In the unique help page there is a reference to duplicate that can help you (althought I haven't tested):

dup <- duplicate(DataFrame1$Col1)
DataFrame2 <- DataFrame1[!dup]

or with subset

DataFrame2 <- subset(DataFrame1, subset=!dup)

Upvotes: 0

Related Questions