java specialist
java specialist

Reputation: 104

How to remove tags using regex / pattern in java

I have a string "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>"", and i want to remove the pattern "<li>test<ul></ul>" from the string. So my desired output will be "<li>test<ul><li>src<ul><li>org<ul>"

I have tried following way.

public class Test {
    public static void main(String[] args) {
        String str = "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>";
        str = str.replaceAll("(?s)<li>.*?<ul></ul>", "");
        System.out.println(str);
    }

}

but it is not worked and I got output as "<li>src<ul><li>org<ul>"

Upvotes: 0

Views: 238

Answers (2)

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10466

Try this and replace by ""

public static void main(String[] args) {
    String str = "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>";
    str = str.replaceAll("<li>([^<]*)<ul><\\/ul>", "");
    System.out.println(str);
}

Edit:

Here's the explanation as requested: reg engine will start matching for anything in between <li> and <ul></ul>. [^<]* will make sure that there is no "<" sign in between ... which is making it kind of lazy which could also be done by using .*?.

Upvotes: 1

Ricky Mutschlechner
Ricky Mutschlechner

Reputation: 4409

I don't think you are quite grasping how RegExs work.

Take a look here: http://regexr.com/3ebpv

Basically, your regex is matching on two parts:

<li>test<ul></ul> and <li>test<ul><li><model><ul></ul>.

Thus leaving you with only: <li>src<ul><li>org<ul>

If you specifically want to remove the pattern <li>test<ul></ul>, Then why are you not using that as the exact thing to replace? That isn't a Regular Expression, it's an exact string you want to find and replace. You're thinking way too hard for a simple problem.

This should suffice, no?

public class Test {
    public static void main(String[] args) {
        String str = "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>";
        str = str.replaceAll("<li>src<ul><li>org<ul>", "");
        System.out.println(str);
    }

}

Upvotes: 1

Related Questions