JavaProphet
JavaProphet

Reputation: 991

RegEx match multiple expressions?

(<link.*>)|(<img.*>)|(<input.*type=\"image\".*>)|(<script.*src=\".*\".*>)

I'm writing a Regular Expression in order to replace all occurrences of inline static content with Base64 Data URIs(not relevant) in HTML. Each of the expressions works perfectly on their own, but I need to get them in the order, and writing a sorting algorithm would be kind of insane with the data I'm working with(this is all already insane). I figure this should work, but it doesn't It matches the first RegEx in there, but not the others. How do you match any of the expressions?

<link.*>
<img.*>
<input.*type=\"image\".*>
<script.*src=\".*\".*>

My Java Code:

    private final Pattern inlineLink = Pattern.compile("(<link.*>)|(<img.*>)|(<input.*type=\"image\".*>)|(<script.*src=\".*\".*>)", Pattern.CASE_INSENSITIVE);



Matcher mtch = inlineLink.matcher(html);
    while (mtch.find()) {
        String o = mtch.group();
        if (!o.contains("href=")) continue;
        String href = o.substring(o.indexOf("href=") + 5);
        if (href.startsWith("\"")) {
            href = href.substring(1, href.indexOf("\"", 1));
        }else {
            href = href.substring(0, href.indexOf(" "));
        }
        href = processHREF(href);
//do other stuffs

Upvotes: 0

Views: 113

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

I suggest doubling the slash and placing alternatives inside the 1st capturing group.

Pattern inlineLink = Pattern.compile("(<link.*>|<img.*>|<input.*type=\\\"image\\\".*>|<script.*src=\\\".*\\\".*>)", Pattern.CASE_INSENSITIVE);

Upvotes: 1

Related Questions