Reputation: 49
I have a dataframe, data
, consisting of:
Name House Street Apt City Postal Phone
BARRY GORDON 106 SOME DR A PLACE Z4K4N2 999-828-7555
ANDERSON TAM 110 SOME DR A PLACE Z4K4N2 999-542-7555
BUCKLE J L S 117 SOME DR A PLACE Z4K4N2 999-212-7555
I have another dataframe, dnc
, read in from read_excel
:
Last Name Address # Street Name
Anderson Some Dr
Cibrian PO Box
Pistell PO Box
I want to exclude all rows that match dnc[,1]
and data$Street
AND the same Street Name in data. So 'Anderson' in data$Name
and 'Some' in data$Street
would delete that row. Notice the difference in capitals in both dfs. I tried manually/hardcode testing and still couldn't get what I wanted. I have been trying:
library(dplyr)
filter(data, dnc[,1] %in% data$Name & dnc[,3] %in% data$Street)
and
avector <- as.vector(dnc[,1])
data[data$Name %in% avector, ]
typeof(data$Name)
[1] "character"
Thanks
Upvotes: 0
Views: 127
Reputation: 28339
Can you test if this work:
library(dplyr)
# Modify entries
data$NameMod <- tolower(gsub(" .*", "", data$Name))
data$StrMod <- tolower(data$Street)
dnc$`Last Name` <- tolower(dnc$`Last Name`)
dnc$`Street Name` <- tolower(dnc$`Street Name`)
# Filter data using dplyr
filter(data, !NameMod %in% dnc$`Last Name` &
StrMod %in% dnc$`Street Name`)
Result using @ycw dataset:
Name House Street Apt City Postal Phone NameMod StrMod
1 BARRY GORDON 106 SOME DR A PLACE Z4K4N2 999-828-7555 barry some dr
2 BUCKLE J L S 117 SOME DR A PLACE Z4K4N2 999-212-7555 buckle some dr
I can't use data that you provided so had to write this blindly. First you have to modify your data (e.g. remove text that goes after Anderson
and turn everything to lower case), then filter by entries in dnc
.
Upvotes: 3
Reputation: 39154
I think you need regular expression to find matching string instead of %in%
.
The following is an example using grepl
and the dplyr
package.
library(dplyr)
data %>%
filter(!(grepl(paste0(unique(dnc$`Last Name`), collapse = "|"), Name, ignore.case = TRUE) &
grepl(paste0(unique(dnc$Address), collapse = "|"), Street, ignore.case = TRUE)))
Name House Street Apt City Postal Phone
1 BARRY GORDON 106 SOME DR A PLACE Z4K4N2 999-828-7555
2 BUCKLE J L S 117 SOME DR A PLACE Z4K4N2 999-212-7555
paste0(..., collapse = "|")
creates a regular expression pattern testing the elements with OR
. ignore.case = TRUE
makes the matching ignores the differences in cases. &
means two grepl
condition needs to both be satisfied. !
indicates the opposite condition.
DATA
library(dplyr)
data <- read.table(text = "Name House Street Apt City Postal Phone
'BARRY GORDON' 106 'SOME DR' A PLACE Z4K4N2 999-828-7555
'ANDERSON TAM' 110 'SOME DR' A PLACE Z4K4N2 999-542-7555
'BUCKLE J L S' 117 'SOME DR' A PLACE Z4K4N2 999-212-7555",
header = TRUE, stringsAsFactors = FALSE)
dnc <- read.table(text = "'Last Name' Address
Anderson 'Some Dr'
Cibrian 'PO Box'
Pistell 'PO Box'",
header = TRUE, stringsAsFactors = FALSE)
dnc <- dnc %>% rename(`Last Name` = Last.Name)
Upvotes: 2