Reputation: 109
By using regular expressions to match the title. Write R snippet that creates a new column called “Female” and fills it with TRUE/FALSE values based on the text provided in the “Name” column. Like if it is "Miss" TRUE, if no salutation assign as "NA"
This is the data frame
df <- data.frame(PersonID=1:8, Name=c("Mr. Bob", "Ms. Blank", "Roger, Mr.", "MR Mark Simpson", "Miss Lisa", "Mrs. joshep", "Rakesh Kumar", "Kumar Gums Murphy"))
grepl("Miss", df, perl=TRUE)
output:
FALSE,FALSE,FALSE
expected output:
FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,NA,NA
Can anyone please help me on this?
Upvotes: 2
Views: 139
Reputation: 834
If you want the NA
for non-specified you have to first rule out that other designations are not present. That is, just because "Miss"
is not present doesn't mean "Mr"
or "MISS"
are not.
The following will assign "M"
,"F"
or NA
in your example. Please add designation as needed.
Titles <- c("Miss", "Ms","Mr","Mrs","MR","MS","MRS","MISS") # vector of possible titles
f.Titles <- c("Miss", "Ms","Mrs","MS","MRS","MISS") # vector of female specific titles
check <- NULL
for(i in 1:length(Titles)){
check <- cbind(check,grepl(Titles[i], df$Name, perl=TRUE))
}
colnames(check) <- Titles
apply(check,1,function(x)ifelse(!any(x),NA,
ifelse(any(names(which(x)) %in% f.Titles),"F","M")))
Output :
[1] "M" "F" "M" "M" "F" "F" NA NA
From there its a simple
G <- apply(check,1,function(x)ifelse(!any(x),NA,
ifelse(any(names(which(x)) %in% f.Titles),"F","M")))
df$Female <- ifelse(G=="F",TRUE,ifelse(is.na(G),NA,FALSE))
df
PersonID Name Female
1 1 Mr. Bob FALSE
2 2 Ms. Blank TRUE
3 3 Roger, Mr. FALSE
4 4 MR Mark Simpson FALSE
5 5 Miss Lisa TRUE
6 6 Mrs. joshep TRUE
7 7 Rakesh Kumar NA
8 8 Kumar Gums Murphy NA
Here is a more efficient version that does exactly what you asked for. Still need to specify all possible Titles
and female titles (f.Titles
)
check <- apply(as.matrix(Titles), 1, function(x) grepl(x, df$Name, perl=TRUE))
colnames(check) <- Titles
df$Female <- apply(check,1,function(x)ifelse(!any(x),NA,ifelse(any(names(which(x)) %in% f.Titles),TRUE,FALSE)))
Upvotes: 1