Michael Gierer
Michael Gierer

Reputation: 443

Replace all tags except one with RegExp in Java

I have got the following problem. I want to delete all substrings which start with < and end with >, except the substring <back>.

Example: <apps> <up> <down> <capital> ... should be deleted, but not <back>.

I am sure this works with RegExp and String.replace(), but I don't know how.

Currently, I have figured out this:

line = line.replaceAll("<[^<]*>", "");

The problem is, that this also deletes the <back>-substring!

I hope someone of you knows a solution.

Thank's for help!

Upvotes: 3

Views: 366

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Use a negative lookahead:

line = line.replaceAll("<(?!back>)[^<>]*>", "");
                         ^^^^^^^^^

See the regex demo.

The pattern matches:

  • < - the < symbol
  • (?!back>) - that is not followed with back> (this negative lookahead, being a zero-width assertion, only checks for the text to the right of the current location, but the text is not consumed)
  • [^<>]* - zero or more chars other than > and <
  • > - a > symbol.

Upvotes: 3

Pavneet_Singh
Pavneet_Singh

Reputation: 37404

you can use (?!<back>)<[^<]*> , line = line.replaceAll("(?!<back>)<[^<]*>", "");

(?!<back>) (negative look ahead) do not match the tag <back>

RegEx Demo

Upvotes: 4

Related Questions