Reputation: 752
I have a big ol' HTML file filled with stuff.
Somewhere in that file, there's a line like this
<span class="xcomponent">pls do not delete me</span>
I need to get rid of the stuff but leave what is in between.
I'm using Java, and I assume the right approach is regex - I just don't really have enough experience with regex to pull this one off.
If it's any help, here's my 'stab in the dark' at it.
.*?(<span class="xcomponent">.*?</span>).*?
Upvotes: 0
Views: 121
Reputation: 9332
This is what you want:
Pattern p = Pattern.compile("<span class="xcomponent">(.*?)</span>");
Matcher m = p.matcher(html);
m.replaceAll("$1");
Upvotes: 1
Reputation: 1706
I assume that the line looks always like
<Something>WHATYOU WANT</closeSomething>
and you don't care about the something. Than the regex looks like:
<.*>.*</.*>
With this line you just use matcher to be sure the line contains the pattern from above. Now you just use the split method and split at each < and >
string.split("<|>")[2]
either the first second or third is what you wanted. I didn't test it if wrong just play a bit.
Upvotes: 0
Reputation: 4543
I write that from my memory, so there will be some msalle errors:
Pattern p = Pattern.compile(".*?(<span class="xcomponent">pls do not delete me</span>).*");
Matcher m = p.matcher(yourLine);
if (matcher.matches()) { yourLine = matcher.group(1); }
Feel free to move brackets in the regexp indise the tag if you want to get rid of it too and leave only the inner text.
Upvotes: 0
Reputation: 27943
myString.replaceAll("<span class=\"xcomponent\">(.*?)</span>", "$1")
Upvotes: 0