Reputation: 2570
If I have these strings:
mystrings <- c("X2/D2/F4",
"X10/D9/F4",
"X3/D22/F4",
"X9/D22/F9")
How can I extract 2,9,22,22
. These characters are between the /
and after the first character within the /
.
I would like to do this in a vectorized fashion and add the new column with transfrom
if possible with which I am familiar.
I think this regex gets me somewhere near all the characters within \
:
^.*\\'(.*)'\\.*$
Upvotes: 18
Views: 17598
Reputation: 47300
Using the package unglue you could do :
# install.packages("unglue")
library(unglue)
unglue_vec(mystrings, "{x}/{y}/{z}", var = "y")
#> [1] "D2" "D9" "D22" "D22"
From a data frame you could use unglue_unnest()
so no need to use transform()
df <- data.frame(col = mystrings)
unglue_unnest(df, col, "{x}/{y}/{z}", remove = FALSE)
#> col x y z
#> 1 X2/D2/F4 X2 D2 F4
#> 2 X10/D9/F4 X10 D9 F4
#> 3 X3/D22/F4 X3 D22 F4
#> 4 X9/D22/F9 X9 D22 F9
# or used unnamed subpatterns to keep only the middle value
unglue_unnest(df, col, "{=.*?}/{y}/{=.*?}", remove = FALSE)
#> col y
#> 1 X2/D2/F4 D2
#> 2 X10/D9/F4 D9
#> 3 X3/D22/F4 D22
#> 4 X9/D22/F9 D22
Created on 2019-11-06 by the reprex package (v0.3.0)
More info: https://github.com/moodymudskipper/unglue/blob/master/README.md
Upvotes: 0
Reputation: 263331
> gsub("(^.+/[A-Z]+)(\\d+)(/.+$)", "\\2", mystrings)
[1] "2" "9" "22" "22"
You would "read" (or "parse") that regex pattern as splitting any matched string into three parts:
1) anything up to and including the first forward slash followed by a sequence of capital letters,
2) any digits(= "\d") in a sequence before the next slash and ,
3) from the next slash to the end.
And then only returning the second part....
Non-matched character strings would be returned unaltered.
Upvotes: 29
Reputation: 4767
Using rex may make this type of task a little simpler.
matches <- re_matches(mystrings,
rex(
"/",
any,
capture(name = "numbers", digits)
)
)
as.numeric(matches$numbers)
#>[1] 2 9 22 22
Upvotes: 1
Reputation: 44614
Using str_extract
from the stringr
package:
as.numeric(str_extract(mystrings, perl('(?<=/[A-Z])[0-9]+(?=/)')))
Upvotes: 8
Reputation: 93813
This ended up being a compacted version of @RomanLuštrik's answer:
gsub("[^0-9]","",sapply(strsplit(mystrings,"/"),"[",2))
[1] "2" "9" "22" "22"
Upvotes: 4
Reputation: 70623
@Arun stole my thunder, so I'm giving my initial long-winded example.
cut.to.pieces <- strsplit(mystrings, split = "/")
got.second <- lapply(cut.to.pieces, "[", 2)
get.numbers <- unlist(got.second)
as.numeric(gsub(pattern = "[[:alpha:]]", replacement = "", x = get.numbers, perl = TRUE))
[1] 2 9 22 22
Upvotes: 8