user5539866
user5539866

Reputation:

R: How to extract specific digits from a string?

I want to retrieve the first Numbers (here -> 344002) from a string:

string <- '<a href="/Archiv-Suche/!344002&amp;s=&amp;SuchRahmen=Print/" ratiourl-ressource="344002"'

I am preferably looking for a regular expression, which looks for the Numbers after the ! and before the &amp.

All I came up with is this but this catches the ! as well (!344002):

regmatches(string, gregexpr("\\!([[:digit:]]+)", string, perl =TRUE))

Any ideas?

Upvotes: 0

Views: 1015

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627546

You may capture the digits (\d+) in between ! and &amp and get it with regexec/regmatches:

> string <- '<a href="/Archiv-Suche/!344002&amp;s=&amp;SuchRahmen=Print/" ratiourl-ressource="344002"'
> pattern = "!(\\d+)&amp;"
> res <- unlist(regmatches(string,regexec(pattern,string)))
> res[2]
[1] "344002"

See the online R demo

Upvotes: 0

JKim
JKim

Reputation: 155

library(gsubfn)
strapplyc(string, "!(\\d+)")[[1]]

Old answer]

Test this code.

library(stringr)
str_extract(string, "[0-9]+")

similar question&answer is present here

Extract a regular expression match in R version 2.10

Upvotes: 0

Nicolas
Nicolas

Reputation: 7121

Use this regex:

(?<=\!)\d+(?=&amp)

Use this code:

regmatches(string, gregexpr("(?<=\!)\d+(?=&amp)", string, perl=TRUE))
  • (?<=\!) is a lookbehind, the match will start following !
  • \d+ matches one digit or more
  • (?=&amp) stops the match if next characters are &amp

Upvotes: 2

Related Questions