rastrast
rastrast

Reputation: 327

regex in R to distinguish male/female in strings

I have strings with descriptions of gender that I need to sort out. For instance, if I have the following,

string1 = "FEMALE AND FEMALE"
string2 = "FEMALE AND MALE"

I need to change string1 to say "MULTIPLE FEMALES", and string2 to say "BOTH MALE AND FEMALE".

Using gsub, I am having trouble writing a substitution that recognizes string2 as different from string1, because MALE is nested in FEMALE. Using "YEP" as a confirmation string first, I have tried the following with no luck,

gsub(".*FEMALE.*MALE.*", "YEP", string1)
gsub(".*FEMALE.*[^M]ALE.*", "YEP", string1)
gsub(".*FEMALE.*[^\b]MALE.*", "YEP", string1)
gsub(".*FEMALE.*(^\bMALE).*", "YEP", string1)
gsub(".*FEMALE.*MALE.*", "YEP", string2)
gsub(".*FEMALE.*[^M]ALE.*", "YEP", string2)
gsub(".*FEMALE.*[^\b]MALE.*", "YEP", string2)
gsub(".*FEMALE.*(^\bMALE).*", "YEP", string2)

I need to account for sequence of wildcard because not all strings will show as "FEMALE AND FEMALE" or "FEMALE AND MALE", sometimes they show up as "1 FEMALE 12 MALES" or "B FEMALE WITH 2X W FEMALE", etc.

Any ideas on how to deal with nested strings using regex?

Upvotes: 1

Views: 471

Answers (1)

rastrast
rastrast

Reputation: 327

Ok, I figured this out right after I posted.

Running gsub(".*FEMALE.*\\b(M)ALE.*", "YEP", string1) results in "FEMALE AND FEMALE", whereas gsub(".*FEMALE.*\\b(M)ALE.*", "YEP", string2) results in "YEP". So this works.

Upvotes: 1

Related Questions