writing a regex which is working pretty well

Question

I have written a regex query to extract a URL. Sample text:

a tinyurl here https://vvvconf.instenv.atl-test.space/x/HQAU

URL I need to extract:

https://vvvconf.instenv.atl-test.space/x/HQAU

My regex attempts:

https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*

This extracts:

{"https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU"}

>https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*<

This extracts:

{">https://vvvconf.instenv.atl-test.space/x/HQAU<"}

How can I refine the regex so I just extract the URL https://vvvconf.instenv.atl-test.space/x/HQAU?

Peter Thoeny · Accepted Answer

Depends if you want to extract the URL from the href attribute, or the text within the a tag. Assuming the latter you can use a positive lookbehind (if your regex flavor supports it)

const input = 'a tinyurl here https://vvvconf.instenv.atl-test.space/x/HQAU';
const regex = /(?<=[>])https:[^<]*/;
console.log(input.match(regex))

Output:

[
  "https://vvvconf.instenv.atl-test.space/x/HQAU"
]

Explanation:

(?<=[>]) - positive lookbehind for > (expects but excludes >)
https: - expect start of URL (add as much as desired)
[^<]* - scan over everything that is not <

writing a regex which is working pretty well

Answers (1)

Related Questions