Reputation: 1770
I have a column with list of names which includes their Title.
Example:
Ryerson, Master. John Borie
Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)
I would like to extract their Titles from their names i.e., Mr, Mrs, Master etc
Function:
In[79]:
mystring="Wilkes, Master. James (Ellen Needs)"
In[80]:
substr(sub(".*,", "", mystring),2,which(strsplit(sub(".*,", "", mystring),"")[[1]]==".")-1)
Out[80]:
[1] "Master"
When I test the above function on one name, it works fine. But, when I apply the same function to the column with list of names, it is extracting only two characters.
Example: Ryerson, Master. John Borie
I would like to see 'Master' extracted from this name whereas I see 'Ma'.
[436] "Mi" "Mi" "Mr" "Mr" "Mr" "Mr" "Mr" "Mr" "Ms" "Mr" "Ma" "Mi" "Mr" "Mi" "Ma"
I don't know what's wrong with the function. Appreciate your help!
Upvotes: 1
Views: 308
Reputation: 2359
If you have any spaces in the vector containing names eg:Mr. Mahesh, you can try this code
my <- c("MR. Arun", "Master. mahesh")
y <- do.call(rbind,strsplit(my," "))
z <- y[,1]
print(z)
[1] "MR." "Master."
Upvotes: 1
Reputation: 887118
Based on the example showed, we can match one or more characters that are not a ,
([^,]+
) followed by ,
and one or more space (\\s+
) from the beginning (^
) of the string or |
a dot
(\\.
) followed by any character till the end of the string (.*
) and replace it with ''
.
gsub("^[^,]+,\\s+|\\..*$", "", str1)
#[1] "Master" "Mrs"
If it is the second 'word', then word
can be used
library(stringr)
word(str1, 2)
#[1] "Master." "Mrs."
str1 <- c("Ryerson, Master. John Borie",
"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)")
Upvotes: 2