Reputation: 8044
Does anybody know which regex to use to extract this character stddata__2015_02_04
from this character "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>"
in R? You may assume that the begging stddata__201
is known, and only the ending changes from time to time.
Upvotes: 0
Views: 235
Reputation: 725
I tend to agree with the other posters, Regex is not the best way to do this. However, if you REALLY want to do this with Regex, here it goes.
(?<=>\s)([^<>\/])+ # Works in php and python, and most other languages
Upvotes: 2
Reputation: 269461
If the input is:
x <- "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>"
then use sub
:
sub(".*(stddata__201[_0-9]+).*", "\\1", x)
giving:
[1] "stddata__2015_02_04"
Here is a visualization of the regular expression:
.*(stddata__201[_0-9]+).*
Upvotes: 3
Reputation: 6659
> library("stringr")
> str_extract("<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>",
+ "stddata__201[0-9]_[0-9]{2}_[0-9]{2}")
[1] "stddata__2015_02_04"
preferred solution is not to regex...
> library("rvest")
> "<li><a href=\"stddata__2015_02_04/\"> stddata__2015_02_04/</a></li>" %>%
+ html() %>%
+ html_text()
[1] " stddata__2015_02_04/"
Upvotes: 1