ksp585
ksp585

Reputation: 1770

R - Extract substring

I have a column with list of names which includes their Title.

Example:

Ryerson, Master. John Borie
Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)

I would like to extract their Titles from their names i.e., Mr, Mrs, Master etc

Function:

In[79]:
mystring="Wilkes, Master. James (Ellen Needs)"
In[80]:
substr(sub(".*,", "", mystring),2,which(strsplit(sub(".*,", "", mystring),"")[[1]]==".")-1)
Out[80]:
[1] "Master"

When I test the above function on one name, it works fine. But, when I apply the same function to the column with list of names, it is extracting only two characters.

Example: Ryerson, Master. John Borie

I would like to see 'Master' extracted from this name whereas I see 'Ma'.

[436] "Mi" "Mi" "Mr" "Mr" "Mr" "Mr" "Mr" "Mr" "Ms" "Mr" "Ma" "Mi" "Mr" "Mi" "Ma"

I don't know what's wrong with the function. Appreciate your help!

Upvotes: 1

Views: 308

Answers (2)

Arun kumar mahesh
Arun kumar mahesh

Reputation: 2359

If you have any spaces in the vector containing names eg:Mr. Mahesh, you can try this code

my <- c("MR. Arun", "Master. mahesh")
y <- do.call(rbind,strsplit(my," "))
z <- y[,1]
print(z)
[1] "MR."     "Master."

Upvotes: 1

akrun
akrun

Reputation: 887118

Based on the example showed, we can match one or more characters that are not a , ([^,]+) followed by , and one or more space (\\s+) from the beginning (^) of the string or | a dot (\\.) followed by any character till the end of the string (.*) and replace it with ''.

gsub("^[^,]+,\\s+|\\..*$", "", str1)
#[1] "Master" "Mrs"  

If it is the second 'word', then word can be used

library(stringr)
word(str1, 2)
#[1] "Master." "Mrs."   

data

str1 <- c("Ryerson, Master. John Borie", 
       "Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)")

Upvotes: 2

Related Questions