Reputation: 117
What's the most reliable way to remove matching Ids from two large data frames in large?
For example, I have a list of participants who do not want to be contacted (n=200). I would like to remove them from my dataset of over 100 variables and 200,000 observations.
This is the list of 200 participants ids that I need to remove from the dataset.
exclude=read.csv("/home/Project/file/excludeids.csv", header=TRUE, sep=",")
dataset.exclusion<- dataset[-which(exclude$ParticipantId %in% dataset$ParticipantId ), ]
Is this the correct command to use?
I don't think this command is doing what I want, because when I verify with the following: length(which(dataset.exclusion$ParticipantId %in% exclusion$ParticipantId))
I don't get 0.
Any insight?
Upvotes: 0
Views: 147
Reputation: 121598
You can do this for example:
sample1[!sample1$ParticipantID %in%
unique(exclusion$ParticipantId),]
Upvotes: 2
Reputation: 12905
Something like this?
library(data.table)
dataset <- data.table(
a = c(1,2,3,4,5,6),
b = c(11,12,13,14,15,16),
d = c(21,22,23,24,25,26)
)
setkeyv(dataset, c('a','b'))
ToExclude <- data.table(
a = c(1,2,3),
b = c(11,12,13)
)
dataset[!ToExclude]
# a b d
# 1: 4 14 24
# 2: 5 15 25
# 3: 6 16 26
Upvotes: 1