Reputation: 123
I have a dataframe with two columns specialty
and keywords
. I use the following code in order to extract values from column keywords
if there is a match is found between search terms
with any value in column specialty
:
speciality <- c("Emergency medicine","Allergology","Anesthesiology","Hematology","Cardiology")
keywords <- c("emergency room OR emergency medicine OR emergency department",
"Allergy OR rhinitis OR asthma OR atopic eczema",
"Pain OR local anaesthesia OR general anaesthesia OR induced sleep",
"Anemia OR bleeding disorders OR hemophilia OR blood cancers",
"Heart OR cardiac diseases OR Cardiomyopathy OR Congenital Heart Disease OR Cardiac Arrhythmia")
sample <- data.frame(speciality, keywords)
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
View(subkeywords)
So I am searching for Allergology
in column speciality
. Once I run the code I get
Allergy OR rhinitis OR asthma OR atopic eczema
The issue I am facing is if I search for allergology
instead of Allergology
, I don't get results. Or if I just want to search with emergency
instead of Emergency medicine
.
Any suggestions?
Upvotes: 0
Views: 608
Reputation: 317
You can try some string trimming like this:
matchList <- sapply(speciality,function(x) strsplit(tolower(x),split=" ")[[1]])
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
keyspecial <- "allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
Upvotes: 0
Reputation: 9656
Change this line:
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
To this one:
subkeywords <- subset(sample$keywords, grepl(keyspecial, sample$speciality, ignore.case=TRUE))
It works because of the function grepl
, which has the ignore.case
parameter that can be set to TRUE
in order to ignore case. However this one looks for incomplete matches. So when you search for Allergology It will also find The Allergology and things like that.
In order to only match full words you can use this one:
subkeywords <- subset(sample$keywords, tolower(sample$speciality)==tolower(keyspecial))
This way you will first convert both words to lowercase form before comparing them.
Upvotes: 2
Reputation: 5893
You could use str_detect
and ignore case
library(tidyverse)
keyspecial <- "allergology"
sample %>%
filter(str_detect(speciality, fixed(keyspecial, ignore_case = TRUE)))
Upvotes: 1