amty
amty

Reputation: 123

r subset cell value from a column based on a match from another column

I have a dataframe with two columns specialty and keywords. I use the following code in order to extract values from column keywords if there is a match is found between search terms with any value in column specialty:

speciality <- c("Emergency medicine","Allergology","Anesthesiology","Hematology","Cardiology")
keywords <- c("emergency room OR emergency medicine OR emergency department", 
          "Allergy OR rhinitis OR asthma OR atopic eczema", 
          "Pain OR local anaesthesia OR general anaesthesia OR induced sleep", 
          "Anemia OR bleeding disorders OR hemophilia OR blood cancers", 
          "Heart OR cardiac diseases OR Cardiomyopathy OR Congenital Heart Disease OR Cardiac Arrhythmia")
sample <- data.frame(speciality, keywords)
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
View(subkeywords)

So I am searching for Allergology in column speciality. Once I run the code I get Allergy OR rhinitis OR asthma OR atopic eczema

The issue I am facing is if I search for allergology instead of Allergology, I don't get results. Or if I just want to search with emergency instead of Emergency medicine.

Any suggestions?

Upvotes: 0

Views: 608

Answers (3)

HolgerBarlt
HolgerBarlt

Reputation: 317

You can try some string trimming like this:

matchList <- sapply(speciality,function(x) strsplit(tolower(x),split=" ")[[1]])
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
keyspecial <- "allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)

Upvotes: 0

Karolis Koncevičius
Karolis Koncevičius

Reputation: 9656

Change this line:

subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)

To this one:

subkeywords <- subset(sample$keywords, grepl(keyspecial, sample$speciality, ignore.case=TRUE))

It works because of the function grepl, which has the ignore.case parameter that can be set to TRUE in order to ignore case. However this one looks for incomplete matches. So when you search for Allergology It will also find The Allergology and things like that.

In order to only match full words you can use this one:

subkeywords <- subset(sample$keywords, tolower(sample$speciality)==tolower(keyspecial))

This way you will first convert both words to lowercase form before comparing them.

Upvotes: 2

erocoar
erocoar

Reputation: 5893

You could use str_detect and ignore case

library(tidyverse)
keyspecial <- "allergology"

sample %>% 
  filter(str_detect(speciality, fixed(keyspecial, ignore_case = TRUE)))

Upvotes: 1

Related Questions