Reputation: 71
I have one dataframe A that contains 2 columns - strings and codes:
String Code
Bacteria.111 abc1
Bacteria.111 abc2
Bacteria.111 abc3
Archaea.153 gmt1
Archaea.153 gmt2
I have another dataframe B which contains a list of what code the number following bacteria/archaea above should be:
Number Code
111 abc2
153 gmt1
What i'm trying to do is remove from dataframe A the rows which have the incorrect code for the number after bacteria/archaea. So e.g. the resulting dataframe from this should read:
String Code
Bacteria.111 abc2
Archaea.153 gmt1
I've tried using %in% but this uses an exact match so nothing matches. All I need is a match for the number which is always at the end of the string and always follows a period. I'm lost on how to accomplish this.
Upvotes: 1
Views: 129
Reputation: 39747
You can use %in%
when you paste the columns and use sub
to remove everything before the .
.
A[paste(sub(".*\\.", "", A$String), A$Code) %in% paste(B$Number, B$Code),]
# String Code
#2 Bacteria.111 abc2
#4 Archaea.153 gmt1
Upvotes: 2
Reputation: 1579
Try this:
A <- A %>%
mutate(
Number = as.integer(stringr::str_sub(String, start = stringi::stri_locate_last_fixed(String, ".")[,2] + 1)) # get the code
) %>%
left_join(B %>% rename(Code2 = Code), by = "Number") %>% # join data frames
filter(Code == Code2) %>% # compare and filter
select(-Number, -Code2)
Upvotes: 1