Reputation: 23
I'm trying to eliminate from a first dataframe all of the rows for which a certain value is in a second dataframe.
Using the R programming language for statistical data analysis.
This is the first question I post here, so bear with me if you please ;)
I work with confidential data, so I recreated the problem with an example.
Name=c("Bussieres", "Nelson")
Fname=c("Paul", "Robert")
Tel=c(123,234)
comp1=data.frame(Name, Fname, Tel)
Name=c("Bussieres","Bussieres","Nelson","Nelson")
Fname=c("Robert","Paul","Paul","Paula")
Tel=c(123,234,345,456)
comp2=data.frame(Name, Fname, Tel)
comp1 returns:
Name Fname Tel
1 Bussieres Paul 123
2 Nelson Robert 234
comp2 returns:
Name Fname Tel
1 Bussieres Robert 123
2 Bussieres Paul 234
3 Nelson Paul 345
4 Nelson Paula 456
Now, what I want is to return the rows of comp1 for which "Name" and "Fname" are not identical in comp2.
The expected return, to be stored in a new dataframe comp3, would be (slight edit done here, posted erronous expected results):
Name Fname Tel
1 Nelson Robert 234
My first attempts were with using the match function, but that didn't quite work.
The following attempt at a for loop also didn't work.
for (i in comp1[,"Name"]){for (j in comp3[,"Name"]){if i!=j return comp3=x1["Name"==i,]}}
I'm surprised that I can't find basic (primitive) functions in R to do this, as excluding certain observations from a data set would be a very routine procedure.
Upvotes: 2
Views: 786
Reputation: 118779
A data.table
solution:
require(data.table)
dt1 <- data.table(comp1, key=c("Name", "Fname"))
dt2 <- data.table(comp2, key=c("Name", "Fname"))
dt1[!dt2]
# Name Fname Tel
# 1: Nelson Robert 234
Upvotes: 6