Get list of links from large text file

Question

I have a huge text file, 20k+ lines, and I want to extract links from it.

What I need is a regular expression that generates a clean list of links.

The links i need start with http:// (without www) and end with .html

What would the expression look like?

deW1 · Accepted Answer

Would look like this for global websites that end with .html pages:

(http|https)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}.+[a-zA-Z0-9\-\.].html

And to match exactly what you specified:

http\://[a-zA-Z0-9\-]+\.+[a-z]{2,}\/[a-zA-Z0-9\-]+.html

Just Ctrl+X and Ctrl+V in a new File and u got it.

Works for JavaScript and Notepad++ so on.

\b is for word boundaries that searches whole words only so if there's just this word in the text like that: ewkgml http://test.com/a.html lamklwmwtmk it will find it and \B is the negation of it so wegniwgnwkjnhttp://test.com/a.htmllmwtlkmt34lt will work too. | is the or statement.

Get list of links from large text file

Answers (2)

Related Questions