arcologies
arcologies

Reputation: 752

REGEX - removing stuff around something?

I have a big ol' HTML file filled with stuff.

Somewhere in that file, there's a line like this

<span class="xcomponent">pls do not delete me</span>

I need to get rid of the stuff but leave what is in between.

I'm using Java, and I assume the right approach is regex - I just don't really have enough experience with regex to pull this one off.

If it's any help, here's my 'stab in the dark' at it.

.*?(<span class="xcomponent">.*?</span>).*?

Upvotes: 0

Views: 121

Answers (4)

Jacob Eggers
Jacob Eggers

Reputation: 9332

This is what you want:

Pattern p = Pattern.compile("<span class="xcomponent">(.*?)</span>");
Matcher m = p.matcher(html);
m.replaceAll("$1");

Upvotes: 1

Christian
Christian

Reputation: 1706

I assume that the line looks always like

<Something>WHATYOU WANT</closeSomething>

and you don't care about the something. Than the regex looks like:

<.*>.*</.*>

With this line you just use matcher to be sure the line contains the pattern from above. Now you just use the split method and split at each < and >

string.split("<|>")[2]

either the first second or third is what you wanted. I didn't test it if wrong just play a bit.

Upvotes: 0

Rostislav Matl
Rostislav Matl

Reputation: 4543

I write that from my memory, so there will be some msalle errors:

Pattern p = Pattern.compile(".*?(<span class="xcomponent">pls do not delete me</span>).*");
Matcher m = p.matcher(yourLine);
if (matcher.matches()) { yourLine = matcher.group(1); }

Feel free to move brackets in the regexp indise the tag if you want to get rid of it too and leave only the inner text.

Upvotes: 0

agent-j
agent-j

Reputation: 27943

myString.replaceAll("<span class=\"xcomponent\">(.*?)</span>", "$1")

Upvotes: 0

Related Questions