Reputation: 611
Imagine following string:
x <- "aokizoizeon1854-1zeoijzeoinq1234-2zeze
I want to extract from this string '1854-1' and '1234-2'. So basically every part of the string that consists of 4 digits followed by a '-' and again one digit.
What is the easiest way to do this?
Upvotes: 2
Views: 2080
Reputation: 24188
A possible base
solution could be to get rid of all the characters then split the numbers into separate strings:
x1 <- gsub("[a-z]", "", x)
substring(x1, seq(1,nchar(x1),6), seq(6, nchar(x1), 6))
#[1] "1854-1" "1234-2"
Upvotes: 3
Reputation: 626870
You can use str_extract_all
from stringr package with a PCRE regex.
In case your strings can contain digit sequences longer than 4 and you want to just match exactly 4 digit sequences followed with a hyphen and then exactly one digit, you will need lookarounds to precise the matches:
> library(stringr)
> x <- "aokizoizeon1854-1zeoijzeoinq1234-2zeze12445664-345ff"
> unlist(str_extract_all(x, "(?<!\\d)\\d{4}-\\d(?!\\d)"))
[1] "1854-1" "1234-2"
The (?<!\\d)\\d{4}-\\d(?!\\d)
regex matches:
(?<!\\d)
- fail the match if there is a digit before the current position\\d{4}-\\d
- match 4 digits followed with a hyphen and 1 digit(?!\\d)
- but fail the match if that one digit is followed with another digit.Upvotes: 2