Reputation: 19
I'm trying to find a tag from begin to end in xml and replace it with a blank. A sample xml is like this
<lins>
<lin index="1"> ...<feature>Something</feature>... </lin>
<lin index="2">...<feature>Something</feature>... </lin>
<lin index="3">...<feature>Something</feature>....</lin>
<lin index="1">...<feature>Icom</feature>... </lin>
<lin index="2">...<feature>Icom</feature>... </lin>
<lins>
I need to remove <lin>
to </lin>
when ever I see Icom in between
<lin\s(.+?Icom.+?)+</lin>
is removing all lin items since it matches the first begin <lin>
tag and the last lin end tag. Greatly appreciated if you can suggest a way to do this. Also I can not use xml parsers in my situation.
Upvotes: 1
Views: 1159
Reputation: 1444
I think you need to add more groups to the regexp.
Add a group for the precondition to start checking for ex (
Then a group for the stuff inbetween, a group for Icom etc.
So off the top of my head my RegEx would look like:
(<lin\ index\=)(\w+Icom\w+)(\<\/lin>)
Note the escaping might be slightly off, but in essence you need more groups and some less eager matchers.
Upvotes: 0
Reputation: 11958
you cant do it with regexp.
For this example:
<tag>
<tag> something </tag>
</tag>
<tag>
</tag>
If you use "<tag>(.*)</tag>"
regexp, your group will be this:
<tag> something </tag>
</tag>
<tag>
and if you use "<tag>(.*?)</tag>"
regexp, your group will be this:
<tag> something
You should use something like stack to get the ending of started tag.
Upvotes: 0
Reputation: 336448
String result = subject.replaceAll("(?s)<lin\\b(?:(?!</lin).)*Icom(?:(?!</lin).)*</lin>", "");
should do this, unless you have <lin>
tags nested into each other (or inside comments/strings).
Explanation:
<lin\b # Match <lin (but not link or linen)
(?: # Match...
(?!</lin) # as long as we're not at a closing tag
. # any character
)* # any number of times.
Icom # Match Icom
(?:(?!</lin).)* # (as above:) Match any character except closing tag
</lin> # Match closing tag
Upvotes: 4