user136508
user136508

Reputation: 49

Filter rows using %in%?

I have a dataframe, data, consisting of:

Name           House Street    Apt  City    Postal  Phone
BARRY GORDON    106  SOME DR       A PLACE  Z4K4N2  999-828-7555
ANDERSON TAM    110  SOME DR       A PLACE  Z4K4N2  999-542-7555
BUCKLE J L S    117  SOME DR       A PLACE  Z4K4N2  999-212-7555

I have another dataframe, dnc, read in from read_excel:

Last Name   Address #   Street Name
Anderson                  Some Dr
Cibrian       PO          Box
Pistell       PO          Box

I want to exclude all rows that match dnc[,1] and data$Street AND the same Street Name in data. So 'Anderson' in data$Name and 'Some' in data$Street would delete that row. Notice the difference in capitals in both dfs. I tried manually/hardcode testing and still couldn't get what I wanted. I have been trying:

library(dplyr)
filter(data, dnc[,1] %in% data$Name & dnc[,3] %in% data$Street)

and

avector <- as.vector(dnc[,1])
data[data$Name %in% avector, ] 

typeof(data$Name)
[1] "character"

Thanks

Upvotes: 0

Views: 127

Answers (2)

pogibas
pogibas

Reputation: 28339

Can you test if this work:

library(dplyr)

# Modify entries 
data$NameMod <- tolower(gsub(" .*", "", data$Name))
data$StrMod <- tolower(data$Street)
dnc$`Last Name` <- tolower(dnc$`Last Name`)
dnc$`Street Name` <- tolower(dnc$`Street Name`)

# Filter data using dplyr
filter(data, !NameMod %in% dnc$`Last Name` & 
             StrMod %in% dnc$`Street Name`)

Result using @ycw dataset:

          Name House  Street Apt  City Postal        Phone NameMod  StrMod
1 BARRY GORDON   106 SOME DR   A PLACE Z4K4N2 999-828-7555   barry some dr
2 BUCKLE J L S   117 SOME DR   A PLACE Z4K4N2 999-212-7555  buckle some dr

I can't use data that you provided so had to write this blindly. First you have to modify your data (e.g. remove text that goes after Anderson and turn everything to lower case), then filter by entries in dnc.

Upvotes: 3

www
www

Reputation: 39154

I think you need regular expression to find matching string instead of %in%.

The following is an example using grepl and the dplyr package.

library(dplyr)

data %>%
  filter(!(grepl(paste0(unique(dnc$`Last Name`), collapse = "|"), Name, ignore.case = TRUE) &
           grepl(paste0(unique(dnc$Address), collapse = "|"), Street, ignore.case = TRUE)))
          Name House  Street Apt  City Postal        Phone
1 BARRY GORDON   106 SOME DR   A PLACE Z4K4N2 999-828-7555
2 BUCKLE J L S   117 SOME DR   A PLACE Z4K4N2 999-212-7555

paste0(..., collapse = "|") creates a regular expression pattern testing the elements with OR. ignore.case = TRUE makes the matching ignores the differences in cases. & means two grepl condition needs to both be satisfied. ! indicates the opposite condition.

DATA

library(dplyr)

data <- read.table(text = "Name           House Street    Apt  City    Postal  Phone
'BARRY GORDON'    106  'SOME DR'       A PLACE  Z4K4N2  999-828-7555
                   'ANDERSON TAM'    110  'SOME DR'       A PLACE  Z4K4N2  999-542-7555
                   'BUCKLE J L S'    117  'SOME DR'       A PLACE  Z4K4N2  999-212-7555",
                   header = TRUE, stringsAsFactors = FALSE)

dnc <- read.table(text = "'Last Name'   Address
Anderson                  'Some Dr'
Cibrian       'PO          Box'
Pistell       'PO          Box'",
                  header = TRUE, stringsAsFactors = FALSE)

dnc <- dnc %>% rename(`Last Name` = Last.Name)

Upvotes: 2

Related Questions