Reputation: 1050
I wrote a regex that finds all links in the text.
(?s)(?m)(?i).*(http://[0-9a-z.%/_-]+).?".*
How to except the references to images, scripts, etc.?
Upvotes: 2
Views: 89
Reputation: 8332
This one is messy, but get's the job done:
(?!https?:\/\/[\w%\/_.-]+\.(jpg|js|gif))(https?:\/\/[\w%\/_.-]+\.\w+)
It's a negative look ahead to rule out unwanted links, followed by a "all links" capture. Maybe not the most elegant solution, but it works.
Also allows https. Add unwanted link types to the (jpg|js|gif) list (separated by vertical bar).
I'm not sure about java, but it works in flavours regex101 offers. Use global flag.
Upvotes: 2