Reputation: 53826
Using below code I'm extracting a generated html link :
mystr <- c("/url?q=http://www.mypage.html&sa=U&ved=0ahUKEwjgyMPj2pXXAhWB5CYKHXysDlsQqQIIKSgAMAg&usg=AOvVaw1VCvT8iznodM3l4xvc8CVq")
str_extract(mystr, "^.*(?=(&sa))")
This returns :
[1] "/url?q=http://www.mypage.html"
How to modify regex in order to exclude /url?q=
? So just http://www.mypage.html
is returned ?
Upvotes: 1
Views: 43
Reputation: 626853
You may also use a base R sub
solution to match up to the first http
and capture it with any chsrs other than &
:
sub(".*?(http[^&]*).*", "\\1", x)
You may precise the pattern to match only after q=
aftrr .*?
.
Details
.*?
- any 0+ chars as few as possible,(http[^&]*)
- capturing group #1 matching http
and then any zero or more chars other than &
.*
- the rest of the string.The \1
is a replacement backreference to the Group 1 value.
Upvotes: 1
Reputation: 51592
You can replace the beginning of the string (i.e. ^
) with http
,
stringr::str_extract(mystr, "http.*(?=(&sa))")
#[1] "http://www.mypage.html"
Upvotes: 1