String scan and match with respect to group in R

Question

I am very much new to R programming. I am working a some data. The data is collected daily from a group of people. Usually, the format of the data is:

name, DOB, HF, LGA

in text format which populates the string vector

 text <- c()

Here, the HF is linked to a database for each LGA(10 in total). That is, each LGA is a group of HFs

Interestingly, due to low level of compliance with the format, there are usually lots of errors in spelling of the HFs.

Here is a sample of the data

 "first person Usman,03May2019,Ntade Health post,LGA1"
 "second person, 7may2019,phc,makirin, LGA2"

#Here, "phc,makirin" is supposed to be spelt "Phc Makirine"

I have been able to extract the LGAs (since they are few) using R codes by some word match syntax covering the possible mistakes in spelling that is usually seen

#LGA vector
library(stringr)
LGA <- c()
LGA[str_detect(text_from_optin, regex("Alier|Aleiro|Alero", ignore_case = TRUE))] <- "ALIERO"
LGA[str_detect(text_from_optin, regex("Augie|Agie|Auge|Auggie?", ignore_case = TRUE))] <- "AUGIE"
LGA[str_detect(text_from_optin, regex("Bagudo", ignore_case = TRUE))] <- "BAGUDO"
LGA[str_detect(text_from_optin, regex("Bir?nin Kebb?i|BirninKebn?i|B\Kebb?i|Binin|birninkebbi", ignore_case = TRUE))] <- "BIRNIN KEBBI"
LGA[str_detect(text_from_optin, regex("Dan?di", ignore_case = TRUE))] <- "DANDI"
LGA[str_detect(text_from_optin, regex("Danko?wasa|Wasagu|D\Was|Dankowasagu|Danko", ignore_case = TRUE))] <- "DANKO WASAGU"
LGA[str_detect(text_from_optin, regex("Fakai", ignore_case = TRUE))] <- "FAKAI"
LGA[str_detect(text_from_optin, regex("Gw?andu", ignore_case = TRUE))] <- "GWANDU"
LGA[str_detect(text_from_optin, regex("Kalg", ignore_case = TRUE))] <- "KALGO"
LGA[str_detect(text_from_optin, regex("Koko Bes|K\Bes|Kokobess?", ignore_case = TRUE))] <- "KOKO BESSE"

For the LGA, Aliero for example, there are about 200 HFs under their standard spellings

I am basically trying to populate the vector

Hf <- c()

with the appropriate word spelling of the HF with respect to the LGA

Is there a there syntax to say:

for each LGA group found in the text, scan if any HF(in the LGA group) matches. If it matches, then populate the vector Hf

Can someone please help me out. Thanks

String scan and match with respect to group in R

Answers (1)

Related Questions