MeetJoeBlack
MeetJoeBlack

Reputation: 2904

Regex to grab all ".js" and ".css" href links from file

I have a string with HTML content and I need to grab all links to .css and .js files. Now, I am using this pattern "(http:.*?.\\.css)" to grab all CSS links, but how I can include .js links, too?

Here is my full code:

List<String> urlList =  new ArrayList<String>();
String str = new String(Files.readAllBytes(FileSystems.getDefault().getPath("c:" + File.separator + "nutchfiles" + File.separator + "test.html")));
Pattern p = Pattern.compile("(http:.*?.\\.css)");
Matcher m = p.matcher(str);

while (m.find()) {
    LOG.info("matched urls" + m.group());
}

Upvotes: 1

Views: 1175

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

If you are looking for a regex fix, here it is:

Pattern p = Pattern.compile("(http:.*?\\.(?:css|js)\\b)");

The alternation will help you match both extensions. See Alternation with The Vertical Bar or Pipe Symbol:

If you want to search for the literal text cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog. If you want more options, simply expand the list: cat|dog|mouse|fish.

However, you'd be safer with an HTML parser to get whatever contents from your HTML files.

Upvotes: 3

Related Questions