Reputation: 109844
If I have a string and want to split on the last digit and keep the last part of the split hpw can I do that?
x <- c("ID", paste0("X", 1:10, state.name[1:10]))
I'd like
[1] NA "Alabama" "Alaska" "Arizona" "Arkansas"
[6] "California" "Colorado" "Connecticut" "Delaware" "Florida"
[11] "Georgia"
But would settle for:
[1] "ID" "Alabama" "Alaska" "Arizona" "Arkansas"
[6] "California" "Colorado" "Connecticut" "Delaware" "Florida"
[11] "Georgia"
I can get the first part by:
unlist(strsplit(x, "[^0-9]*$"))
But want the second part.
Thank you in advance.
Upvotes: 6
Views: 639
Reputation: 269481
gsubfn
Try this gsubfn solution:
> library(gsubfn)
> strapply(x, ".*\\d(\\w*)|$", ~ if (nchar(z)) z else NA, simplify = TRUE)
[1] NA "Alabama" "Alaska" "Arizona" "Arkansas"
[6] "California" "Colorado" "Connecticut" "Delaware" "Florida"
[11] "Georgia"
It matches the last digit followed by word characters and returns the word characters or if that fails it matches the end of line (to ensure that it matches something). If the first match succeeded then return it; otherwise, the back reference will be empty so return NA.
Note that the formula is a short hand way of writing the function function(z) if (nchar(z)) z else NA
and that function could alternately replace the formula at the expense of a slightly more keystrokes.
gsub
A similar strategy could also work using just straight gsub
but requires two lines and a marginally more complex regular expression. Here we use the second alternative to slurp up non-matches from the first alternative:
> s <- gsub(".*\\d(\\w*)|.*", "\\1", x)
> ifelse(nchar(s), s, NA)
[1] NA "Alabama" "Alaska" "Arizona" "Arkansas"
[6] "California" "Colorado" "Connecticut" "Delaware" "Florida"
[11] "Georgia"
EDIT: minor improvements
Upvotes: 2
Reputation: 179408
You can do this one easy step with a regular expression:
gsub("(^.*\\d+)(\\w*)", "\\2", x)
Results in:
[1] "ID" "Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut"
[9] "Delaware" "Florida" "Georgia"
What the regex does:
"(^.*\\d+)(\\w*)"
: Look for two groups of characters.
(^.*\\d+)
looks for any digit followed by at least one number at the start of the string.\\w*
looks for an alpha-numeric character."\\2"
as the second argument to gsub()
means to replace the original string with the second group that the regex found.Upvotes: 4
Reputation: 93813
This seems a bit clunky, but it works:
state.pt2 <- unlist(strsplit(x,"^.[0-9]+"))
state.pt2[state.pt2!=""]
It would be nice to remove the ""
's generated by the match at the start of the string but I can't figure that out.
Here's another method using substr
and gregexpr
too that avoids having to subset the results:
substr(x,unlist(lapply(gregexpr("[0-9]",x),max))+1,nchar(x))
Upvotes: 2
Reputation: 115382
library(stringr)
unlist(lapply(str_split(x, "[0-9]"), tail,n=1))
gives
[1] "ID" "Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut" "Delaware"
[10] "Florida" "Georgia"
I would look at the documentation stringr
for (most possibly) an even better approach.
Upvotes: 2