Reputation: 402
I applied case_when
to a text data of thousands of rows to detect strings with multiple conditions and replace them but got a wrong result because case_when
doesn't execute the remaining conditions once a condition is met. I have seen a solution in How to detect more than one regex in a case_when statement, but the solution does not have multiplicity of multiple conditions such as in my data.
Any alternative to case_when
will be is appreciated.
This is the dummy data:
statement <- structure(list(stmt = c("diabetes is common", "police not my friend"
"transport is cheap", "english is my language", "education is my right")),
class = "data.frame", row.names = c(NA, -5L))
I tried to adapt the 1st solution in How to detect more than one regex in a case_when statement but could not really figure it out.
I want to detect strings in texts in column stmt
and recode the column into these five domains: APC
, PDP
, APGA
, APP
and SDP
. Below are strings to be detected:
APC <- c("addiction|mental||Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic")
PDP <- c("whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment")
APGA <- c("Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite|")
APP <- c("Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills")
rangatiratanga <- c("self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues")
Upvotes: 1
Views: 754
Reputation: 402
Thanks to @Tim Biegeleisen, but detecting strings ordinarily using case_when()
& grepl()
may throw up errors, if cases are not ignored. The grepl()
can include ignore.case = T
argument in order to make string matching case insensitive, such as in the code below:
statement$col <- case_when(
grepl(ignore.case = T, "(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
grepl(ignore.case = T, "(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
grepl(ignore.case = T, "(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
grepl(ignore.case = T, "(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
grepl(ignore.case = T, "(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
TRUE ~ NA_character_
)
Upvotes: 0
Reputation: 520978
You may use case_when
with grepl
and a regex alternation:
statement$col <- case_when(
grepl("(addiction|mental|Diabetes|health|healthy|Oranga|unwell|AOD| well| surgery|dental|recovery|oranga|Mirimiri|asthma|anger|checks|alcohol|pregnant|clinical|clinic)", statement$stmt) ~ "APC",
grepl("(whanau direct|whānau direct|money|transport|home|repairs|social|budget|job|housing|house|financial|finance|Ohanga|furniture|accommodation|welfare|living|work|babies arrival|AT hop card|Entitlements|ohunga|bills|electricity|water|employment)", statement$stmt) ~ "PDP",
grepl("(Kaupapa|Te reo|language|Tikanga|Iwi|relationship|Tikinga|Reunite)", statement$stmt) ~ "APGA",
grepl("(Studying|training|NCEA|ECE|Counseling|counsel|Knowledge|School|Education|matauranga|parenting|skills)", statement$stmt) ~ "APP",
grepl("(self-management|Rangitiratanga|custody|police|court|CYFS|advocacy|Oranga Tamariki|rangatiratanga|section 101|EPOA|Familly issues)", statement$stmt) ~ "rangatiratanga",
TRUE ~ NA_character_
)
Upvotes: 1