David L
David L

Reputation: 23

Selecting Last names with regex in R

I have a vector that I need to extract the last name into a list. I will then use the list to compare against a set of last names using match. I am having issues doing extracting the last name. Here is an example of

Suzanne Sar Abay bob, Lucy Heaton, Lynn Slaney, Michael Hughes,

I need to the last names of these.

vector <- gsub("\s(\w+)$", "", data_agent$Name, perl = TRUE)

This ends up giving me the Suzanne Sar Abay, Lucy, Lynn, Michael. Not the last names. Regex selects the last name successfully. However, realized gsub replaces the value.

vector1 <- gsub("(.+)\s\w+$", "", data_agent$List.Name, perl = TRUE)

This is suppose to select everything except the last name but it is not working. All it returns is blanks. "" "" ""

I was wondering if someone can help me with this?

Upvotes: 0

Views: 241

Answers (3)

akrun
akrun

Reputation: 887651

There are two problems in the code. One is that we need to escape \s and \w (in the OP's original post) by using \\s and \\w. The second is that we are using a capture group by placing the \\w+ inside the parentheses ((...)). In the replacement, we can use \\1 instead of '' and it should work.

 sub(".*\\s+(\\w+)$", "\\1", data_agent$Name)
 #[1] "bob"    "Heaton" "Slaney" "Hughes"

Or using stringi

library(stringi)
stri_extract_last(data_agent$Name, regex='\\w+')
#[1] "bob"    "Heaton" "Slaney" "Hughes"

data

 data_agent <- structure(list(Name = c("Suzanne Sar Abay bob", 
 "Lucy Heaton", 
 "Lynn Slaney", "Michael Hughes")), .Names = "Name", row.names = c(NA, 
-4L), class = "data.frame")

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174796

Keep it simple. Just remove all the characters upto the last space.

Simply use sub,

sub(".*\\s", "", data_agent$Name)

.* is greedy by default which matches all the characters upto the last and then it backtracks to last space because we included \\s next to .*. So it matches all the chars upto the last space.

Example:

> x <- c('Suzanne Sar Abay bob', 'Lucy Heaton', 'Lynn Slaney', 'Michael Hughes')
> sub(".*\\s", "", x)
[1] "bob"    "Heaton" "Slaney" "Hughes"

OR

Just extract the last word.

> library(stringr)
> str_extract(x, "\\w+$")
[1] "bob"    "Heaton" "Slaney" "Hughes"
> str_extract(x, "\\S+$")
[1] "bob"    "Heaton" "Slaney" "Hughes"

Upvotes: 1

vks
vks

Reputation: 67988

^.*(?=\\b\\w+$)

You need to put it in lookahead.See demo.

https://regex101.com/r/uF4oY4/64

gsub("^.*(?=\\b\\w+$)", "", data_agent$List.Name, perl = TRUE)

Upvotes: 1

Related Questions