Reputation: 1516
Novice on regular expressions here ...
Assume the following names:
names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack")
I want to split the names, as to retain all the characters up to and including the first letter of the first name. Thus, the results would look this:
Jackson, M
Lennon, J
Obama, B
I know this is a simple solution, but I am stuck on specifying what seems to be a reasonable solution -- that is, a positive lookahead regex. I am specifying a match based on the comma, the space, and the first letter in caps. This is what I have but obviously it is wrong:
names.reduced <- gsub("(?=\\,\\s[A-Z]).*", "", names)
Upvotes: 5
Views: 2658
Reputation: 70732
(?= ... )
is a zero-width assertion which does not consume any characters on the string.
It only matches a position in the string. The point of zero-width is the validation to see if a regular expression can or cannot be matched looking ahead from the current position, without adding to the overall match. In this case, using a lookahead assertion is not necessary at all.
You can do this using a capture group, backreferencing the group inside the replacement call.
sub('(.*[A-Z]).*', '\\1', names)
# [1] "Jackson, M" "Lennon, J" "Obama, B"
Or better yet, you can use negation to remove all except A
to Z
at the end of the string.
sub('[^A-Z]*$', '', names)
# [1] "Jackson, M" "Lennon, J" "Obama, B"
Upvotes: 10
Reputation: 174706
You could use regmatches
function also.
> names <- c("Jackson, Michael", "Lennon, John", "Obama, Barack")
> regmatches(names, regexpr(".*,\\s*[A-Z]", names))
[1] "Jackson, M" "Lennon, J" "Obama, B"
OR
> library(stringi)
> stri_extract(names, regex=".*,\\s*[A-Z]")
[1] "Jackson, M" "Lennon, J" "Obama, B"
OR
Just match all the chars upto the last uppercase letter.
> stri_extract(names, regex=".*[A-Z]")
[1] "Jackson, M" "Lennon, J" "Obama, B"
Upvotes: 2
Reputation: 887088
You can use a lookbehind instead of the lookahead assertion
sub('(?<=, [A-Z]).*$', '', names, perl=TRUE)
#[1] "Jackson, M" "Lennon, J" "Obama, B"
Upvotes: 3