Reputation: 23
I have a vector that I need to extract the last name into a list. I will then use the list to compare against a set of last names using match. I am having issues doing extracting the last name. Here is an example of
Suzanne Sar Abay bob, Lucy Heaton, Lynn Slaney, Michael Hughes,
I need to the last names of these.
vector <- gsub("\s(\w+)$", "", data_agent$Name, perl = TRUE)
This ends up giving me the Suzanne Sar Abay, Lucy, Lynn, Michael. Not the last names. Regex selects the last name successfully. However, realized gsub
replaces the value.
vector1 <- gsub("(.+)\s\w+$", "", data_agent$List.Name, perl = TRUE)
This is suppose to select everything except the last name but it is not working. All it returns is blanks. "" "" ""
I was wondering if someone can help me with this?
Upvotes: 0
Views: 241
Reputation: 887651
There are two problems in the code. One is that we need to escape \s
and \w
(in the OP's original post) by using \\s
and \\w
. The second is that we are using a capture group by placing the \\w+
inside the parentheses ((...)
). In the replacement, we can use \\1
instead of ''
and it should work.
sub(".*\\s+(\\w+)$", "\\1", data_agent$Name)
#[1] "bob" "Heaton" "Slaney" "Hughes"
Or using stringi
library(stringi)
stri_extract_last(data_agent$Name, regex='\\w+')
#[1] "bob" "Heaton" "Slaney" "Hughes"
data_agent <- structure(list(Name = c("Suzanne Sar Abay bob",
"Lucy Heaton",
"Lynn Slaney", "Michael Hughes")), .Names = "Name", row.names = c(NA,
-4L), class = "data.frame")
Upvotes: 0
Reputation: 174796
Keep it simple. Just remove all the characters upto the last space.
Simply use sub,
sub(".*\\s", "", data_agent$Name)
.*
is greedy by default which matches all the characters upto the last and then it backtracks to last space because we included \\s
next to .*
. So it matches all the chars upto the last space.
Example:
> x <- c('Suzanne Sar Abay bob', 'Lucy Heaton', 'Lynn Slaney', 'Michael Hughes')
> sub(".*\\s", "", x)
[1] "bob" "Heaton" "Slaney" "Hughes"
OR
Just extract the last word.
> library(stringr)
> str_extract(x, "\\w+$")
[1] "bob" "Heaton" "Slaney" "Hughes"
> str_extract(x, "\\S+$")
[1] "bob" "Heaton" "Slaney" "Hughes"
Upvotes: 1
Reputation: 67988
^.*(?=\\b\\w+$)
You need to put it in lookahead
.See demo.
https://regex101.com/r/uF4oY4/64
gsub("^.*(?=\\b\\w+$)", "", data_agent$List.Name, perl = TRUE)
Upvotes: 1