Abhishek Ranjan
Abhishek Ranjan

Reputation: 931

Regex to replace attributes of style from an HTML string

I want a regex which removes a list of attributes from within the style attribute of a given html tag.

Ex : i want to remove height and cursor from span tag.

I/P:

String htmlFragment ="<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name="questions"> <b>hh</b></span>";

O/P

<span id="nav-askquestion" style="width:200px;" name="questions"><b>hh</b></span>

I have the following regex but it removes all occurrences height and cursor, not just inside div

String cleanString=htmlFragment.replaceAll("(height|cursor)[ ]*:[ ]*[^;]+;",""); 

Not looking to use html parser for this due to specific requirement.

Upvotes: 0

Views: 2658

Answers (3)

Frelling
Frelling

Reputation: 3507

I agree with others that it would be better to use HTML/XML parsers, which allow you to drill down to specific elements without worrying about any "accidental" regex matches.

However, having read Xlsx's comment, "You cannot use only one RegEx for this." I was compelled to post this solution using captured groups. This is purely for demonstration purposes only

String reg = "(<span.+)((height|cursor) *:[^;]+;)(.*)((height|cursor) *:[^;]+;)(.*)";

String cleanString=htmlFragment.replaceAll(reg, "$1$4$7"); 

Obviously, it is not pretty and it may still match on some HTML content (as opposed to tags), but it is possible. Unless this is intended as a quick fix, I urge you to use a more appropriate solution as suggested by others. One possible solution would be jsoup.

Upvotes: 1

Lonnie Best
Lonnie Best

Reputation: 11364

Try this regular expression:

\s*(height|cursor)\s*:\s*.+?\s*;\s*

You can test it out here.

If there are other attributes besides height and cursor, you want to capture, you can just keep adding bars between them (background-color|height|font-size) etc.

Upvotes: 1

ninja.coder
ninja.coder

Reputation: 9648

As I said before, I will strongly suggest to not to use RegEx for this and make use of HTML/XML parsers for parsing the tags and data and then do all your operations.

But if you don't want to do that for some reason then I would suggest you to fallback to the basic sub-string based methods rather than using RegEx.

Here is a sample code snippet for the above situation:

public static void main(String[] args) {
    String htmlFragment = "<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name=\"questions\"> <b>hh</b></span>";
    int startIndex = htmlFragment.indexOf("<span");
    int stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Cursor */
    int cursorStart = htmlFragment.indexOf("cursor:", startIndex);
    int cursorEnd = htmlFragment.indexOf(";", cursorStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, cursorStart))
            .append(htmlFragment.substring(cursorEnd + 1, stopIndex))
            .toString();

    /* Update Indices */
    stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Height */
    int heightStart = htmlFragment.indexOf("height:", startIndex);
    int heightEnd = htmlFragment.indexOf(";", heightStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, heightStart))
            .append(htmlFragment.substring(heightEnd + 1, stopIndex))
            .toString();

    /* Output */
    System.out.println(htmlFragment);
}

I know it looks a bit messy but that's the only way I could think of.

Upvotes: 0

Related Questions