Marcin
Marcin

Reputation: 8044

Which regex to use in R?

Does anybody know which regex to use to extract this character stddata__2015_02_04 from this character "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>" in R? You may assume that the begging stddata__201 is known, and only the ending changes from time to time.

Upvotes: 0

Views: 235

Answers (3)

Blue0500
Blue0500

Reputation: 725

I tend to agree with the other posters, Regex is not the best way to do this. However, if you REALLY want to do this with Regex, here it goes.

(?<=>\s)([^<>\/])+        # Works in php and python, and most other languages

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269461

If the input is:

x <- "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>"

then use sub:

sub(".*(stddata__201[_0-9]+).*", "\\1", x)

giving:

[1] "stddata__2015_02_04"

Here is a visualization of the regular expression:

.*(stddata__201[_0-9]+).*

Regular expression visualization

Debuggex Demo

Upvotes: 3

cory
cory

Reputation: 6659

> library("stringr")
> str_extract("<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>",
+             "stddata__201[0-9]_[0-9]{2}_[0-9]{2}")
[1] "stddata__2015_02_04"

preferred solution is not to regex...

> library("rvest")
> "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>" %>% 
+   html() %>% 
+   html_text()
[1] " stddata__2015_02_04/"

Upvotes: 1

Related Questions