Robin Alexander
Robin Alexander

Reputation: 1004

Regex URL extract without www

I have this /href *= *[\'"]\Khttps?:\/\/(?:www\.)?twitter\.com[^\'"]+/ expression to extract twitter-urls. It works great for all urls starting with www, but not if www is missing. What do I have to change this line to, in order to have both links (with and without www) fetched with RegEx?

<a href="//www.twitter.com/anything">LINK1</a>
<a href="//twitter.com/anything">LINK2</a>

Thanks for your help!

Yes, I know there are some posts containing that issue and showing solutions to that, but none of them really helped me solve this.

Upvotes: 0

Views: 47

Answers (1)

logi-kal
logi-kal

Reputation: 7880

www is not the problem. As you can see, your pattern contains https?, you have to make it optional:

href *= *[\'"]\K(?:https?:)?\/\/(?:www\.)?twitter\.com[^\'"]+

See demo.

Upvotes: 3

Related Questions