Reputation: 639
I have written a regex query to extract a URL. Sample text:
<p>a tinyurl here <a href="https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU</a></p>
URL I need to extract:
https://vvvconf.instenv.atl-test.space/x/HQAU
My regex attempts:
https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*
This extracts:
{"https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU"}
>https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*<
This extracts:
{">https://vvvconf.instenv.atl-test.space/x/HQAU<"}
How can I refine the regex so I just extract the URL https://vvvconf.instenv.atl-test.space/x/HQAU
?
Upvotes: 0
Views: 31
Reputation: 7616
Depends if you want to extract the URL from the href
attribute, or the text within the a
tag. Assuming the latter you can use a positive lookbehind (if your regex flavor supports it)
const input = '<p>a tinyurl here <a href="https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU</a></p>';
const regex = /(?<=[>])https:[^<]*/;
console.log(input.match(regex))
Output:
[
"https://vvvconf.instenv.atl-test.space/x/HQAU"
]
Explanation:
(?<=[>])
- positive lookbehind for >
(expects but excludes >
)https:
- expect start of URL (add as much as desired)[^<]*
- scan over everything that is not <
Upvotes: 1