Reputation: 1069
I'm looking to pull only those entries which occur exactly once within a dataframe. As an example:
DataFrame1 Col1 Col2 ABC 5 DEF 6 DEF 7 HIJ 8
I would like to pull only:
DataFrame2 ABC HIJ
Where the uniqueness is determined only by Col1.
Any ideas?
Thanks!
Upvotes: 0
Views: 127
Reputation: 630
How about:
#If you want just the unique
DataFrame1[which(table(DataFrame1[,"Col1"])==1),"Col1"]values.
#If you want the whole corresponding row.
DataFrame1[which(table(DataFrame1[,"Col1"])==1),]
Upvotes: 0
Reputation: 6197
It's a bit cumbersome, but this works:
x <- table(DataFrame1[, 1]) == 1
DataFrame2 <- na.omit(data.frame(ifelse(x, names(x),NA)))
Or more elegantly with sql:
library(sqldf)
DataFrame2 <- sqldf('select Col1 from DataFrame1 group by Col1 having count(Col1) = 1')
Upvotes: 1
Reputation: 193517
You can use ave
to create a vector of counts of the values in col1
and subset based on that:
mydf[with(mydf, ave(Col1, Col1, FUN = length)) == "1", ]
# Col1 Col2
# 1 ABC 5
# 4 HIJ 8
Or, similarly, with "data.table":
library(data.table)
DT <- data.table(mydf)
DT[, id := .N, by = Col1][id == 1]
# Col1 Col2 id
# 1: ABC 5 1
# 2: HIJ 8 1
Duplicated also works, if you run it twice, once from each direction:
mydf[!(duplicated(mydf$Col1) | duplicated(mydf$Col1, fromLast=TRUE)), ]
# Col1 Col2
# 1 ABC 5
# 4 HIJ 8
Upvotes: 1
Reputation: 3397
In the unique
help page there is a reference to duplicate
that can help you (althought I haven't tested):
dup <- duplicate(DataFrame1$Col1)
DataFrame2 <- DataFrame1[!dup]
or with subset
DataFrame2 <- subset(DataFrame1, subset=!dup)
Upvotes: 0