Reputation: 931
I want a regex which removes a list of attributes from within the style attribute of a given html tag.
Ex : i want to remove height and cursor from span tag.
I/P:
String htmlFragment ="<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name="questions"> <b>hh</b></span>";
O/P
<span id="nav-askquestion" style="width:200px;" name="questions"><b>hh</b></span>
I have the following regex but it removes all occurrences height and cursor, not just inside div
String cleanString=htmlFragment.replaceAll("(height|cursor)[ ]*:[ ]*[^;]+;","");
Not looking to use html parser for this due to specific requirement.
Upvotes: 0
Views: 2658
Reputation: 3507
I agree with others that it would be better to use HTML/XML parsers, which allow you to drill down to specific elements without worrying about any "accidental" regex matches.
However, having read Xlsx's comment, "You cannot use only one RegEx for this." I was compelled to post this solution using captured groups. This is purely for demonstration purposes only
String reg = "(<span.+)((height|cursor) *:[^;]+;)(.*)((height|cursor) *:[^;]+;)(.*)";
String cleanString=htmlFragment.replaceAll(reg, "$1$4$7");
Obviously, it is not pretty and it may still match on some HTML content (as opposed to tags), but it is possible. Unless this is intended as a quick fix, I urge you to use a more appropriate solution as suggested by others. One possible solution would be jsoup.
Upvotes: 1
Reputation: 11364
Try this regular expression:
\s*(height|cursor)\s*:\s*.+?\s*;\s*
You can test it out here.
If there are other attributes besides height and cursor, you want to capture, you can just keep adding bars between them (background-color|height|font-size) etc.
Upvotes: 1
Reputation: 9648
As I said before, I will strongly suggest to not to use RegEx for this and make use of HTML/XML parsers for parsing the tags and data and then do all your operations.
But if you don't want to do that for some reason then I would suggest you to fallback to the basic sub-string based methods rather than using RegEx
.
Here is a sample code snippet for the above situation:
public static void main(String[] args) {
String htmlFragment = "<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name=\"questions\"> <b>hh</b></span>";
int startIndex = htmlFragment.indexOf("<span");
int stopIndex = htmlFragment.indexOf("</span>") + 7;
/* Cursor */
int cursorStart = htmlFragment.indexOf("cursor:", startIndex);
int cursorEnd = htmlFragment.indexOf(";", cursorStart);
htmlFragment = new StringBuilder()
.append(htmlFragment.substring(startIndex, cursorStart))
.append(htmlFragment.substring(cursorEnd + 1, stopIndex))
.toString();
/* Update Indices */
stopIndex = htmlFragment.indexOf("</span>") + 7;
/* Height */
int heightStart = htmlFragment.indexOf("height:", startIndex);
int heightEnd = htmlFragment.indexOf(";", heightStart);
htmlFragment = new StringBuilder()
.append(htmlFragment.substring(startIndex, heightStart))
.append(htmlFragment.substring(heightEnd + 1, stopIndex))
.toString();
/* Output */
System.out.println(htmlFragment);
}
I know it looks a bit messy but that's the only way I could think of.
Upvotes: 0