Vik G
Vik G

Reputation: 639

writing a regex which is working pretty well

I have written a regex query to extract a URL. Sample text:

<p>a tinyurl here <a href="https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU</a></p>

URL I need to extract:

https://vvvconf.instenv.atl-test.space/x/HQAU

My regex attempts:

  1. https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*
    

    This extracts:

    {"https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU"}
    
  2. >https:\/\/vvvconf.[a-z].*\/x\/[a-zA-z0-9]*<
    

    This extracts:

    {">https://vvvconf.instenv.atl-test.space/x/HQAU<"}
    

How can I refine the regex so I just extract the URL https://vvvconf.instenv.atl-test.space/x/HQAU?

Upvotes: 0

Views: 31

Answers (1)

Peter Thoeny
Peter Thoeny

Reputation: 7616

Depends if you want to extract the URL from the href attribute, or the text within the a tag. Assuming the latter you can use a positive lookbehind (if your regex flavor supports it)

const input = '<p>a tinyurl here <a href="https://vvvconf.instenv.atl-test.space/x/HQAU">https://vvvconf.instenv.atl-test.space/x/HQAU</a></p>';
const regex = /(?<=[>])https:[^<]*/;
console.log(input.match(regex))

Output:

[
  "https://vvvconf.instenv.atl-test.space/x/HQAU"
]

Explanation:

  • (?<=[>]) - positive lookbehind for > (expects but excludes >)
  • https: - expect start of URL (add as much as desired)
  • [^<]* - scan over everything that is not <

Upvotes: 1

Related Questions