user1320502
user1320502

Reputation: 2570

Extract string between /

If I have these strings:

mystrings <- c("X2/D2/F4",
               "X10/D9/F4",
               "X3/D22/F4",
               "X9/D22/F9")

How can I extract 2,9,22,22. These characters are between the / and after the first character within the /.

I would like to do this in a vectorized fashion and add the new column with transfrom if possible with which I am familiar.

I think this regex gets me somewhere near all the characters within \:

^.*\\'(.*)'\\.*$

Upvotes: 18

Views: 17598

Answers (7)

moodymudskipper
moodymudskipper

Reputation: 47300

Using the package unglue you could do :

# install.packages("unglue")
library(unglue)

unglue_vec(mystrings, "{x}/{y}/{z}", var = "y")
#> [1] "D2"  "D9"  "D22" "D22"

From a data frame you could use unglue_unnest() so no need to use transform()

df <- data.frame(col = mystrings)
unglue_unnest(df, col, "{x}/{y}/{z}", remove = FALSE)
#>         col   x   y  z
#> 1  X2/D2/F4  X2  D2 F4
#> 2 X10/D9/F4 X10  D9 F4
#> 3 X3/D22/F4  X3 D22 F4
#> 4 X9/D22/F9  X9 D22 F9

# or used unnamed subpatterns to keep only the middle value
unglue_unnest(df, col, "{=.*?}/{y}/{=.*?}", remove = FALSE)
#>         col   y
#> 1  X2/D2/F4  D2
#> 2 X10/D9/F4  D9
#> 3 X3/D22/F4 D22
#> 4 X9/D22/F9 D22

Created on 2019-11-06 by the reprex package (v0.3.0)

More info: https://github.com/moodymudskipper/unglue/blob/master/README.md

Upvotes: 0

IRTFM
IRTFM

Reputation: 263331

> gsub("(^.+/[A-Z]+)(\\d+)(/.+$)", "\\2", mystrings)
[1] "2"  "9"  "22" "22"

You would "read" (or "parse") that regex pattern as splitting any matched string into three parts:

1) anything up to and including the first forward slash followed by a sequence of capital letters,

2) any digits(= "\d") in a sequence before the next slash and ,

3) from the next slash to the end.

And then only returning the second part....

Non-matched character strings would be returned unaltered.

Upvotes: 29

Jim
Jim

Reputation: 4767

Using rex may make this type of task a little simpler.

matches <- re_matches(mystrings,
  rex(
    "/",
    any,
    capture(name = "numbers", digits)
    )
  )

as.numeric(matches$numbers)
#>[1]  2  9 22 22

Upvotes: 1

Matthew Plourde
Matthew Plourde

Reputation: 44614

Using str_extract from the stringr package:

as.numeric(str_extract(mystrings, perl('(?<=/[A-Z])[0-9]+(?=/)')))

Upvotes: 8

thelatemail
thelatemail

Reputation: 93813

This ended up being a compacted version of @RomanLuštrik's answer:

gsub("[^0-9]","",sapply(strsplit(mystrings,"/"),"[",2))
[1] "2"  "9"  "22" "22"

Upvotes: 4

Roman Luštrik
Roman Luštrik

Reputation: 70623

@Arun stole my thunder, so I'm giving my initial long-winded example.

cut.to.pieces <- strsplit(mystrings, split = "/")
got.second <- lapply(cut.to.pieces, "[", 2)
get.numbers <- unlist(got.second)
as.numeric(gsub(pattern = "[[:alpha:]]", replacement = "", x = get.numbers, perl = TRUE))
[1]  2  9 22 22

Upvotes: 8

Arun
Arun

Reputation: 118779

as.numeric(gsub("^.*D([0-9]+).*$", "\\1", mystrings))

Upvotes: 20

Related Questions