Reputation: 2904
I have a string with HTML content and I need to grab all links to .css and .js files. Now, I am using this pattern "(http:.*?.\\.css)"
to grab all CSS links, but how I can include .js links, too?
Here is my full code:
List<String> urlList = new ArrayList<String>();
String str = new String(Files.readAllBytes(FileSystems.getDefault().getPath("c:" + File.separator + "nutchfiles" + File.separator + "test.html")));
Pattern p = Pattern.compile("(http:.*?.\\.css)");
Matcher m = p.matcher(str);
while (m.find()) {
LOG.info("matched urls" + m.group());
}
Upvotes: 1
Views: 1175
Reputation: 626747
If you are looking for a regex fix, here it is:
Pattern p = Pattern.compile("(http:.*?\\.(?:css|js)\\b)");
The alternation will help you match both extensions. See Alternation with The Vertical Bar or Pipe Symbol:
If you want to search for the literal text
cat
ordog
, separate both options with a vertical bar or pipe symbol:cat|dog
. If you want more options, simply expand the list:cat|dog|mouse|fish
.
However, you'd be safer with an HTML parser to get whatever contents from your HTML files.
Upvotes: 3