zcaudate
zcaudate

Reputation: 14258

regex that matches for two different places on input

The use case is for reformatting xml. I currently have a snippet that looks like this:

<dependencies>
    <dependency>
        <groupId>
             com.googlecode.java-diff-utils
        </groupId>
        <artifactId>
             diffutils
        </artifactId>
        <version>
             1.3.0
        </version>
    </dependency>
</dependencies>

I want it to look like this:

<dependencies>
    <dependency>
       <groupId>com.googlecode.java-diff-utils</groupId>
       <artifactId>diffutils</artifactId>
       <version>1.3.0</version>
    </dependency>
</dependencies>

So the case is that I want to match <tag></tag> pairs that do not have additional pairs within them, something like this:

output.replaceAll("<{TAG}>\\s+([^<>])\\s+</{TAG}>",  
                  "<{TAG}>($1)</{TAG}>")

where {TAG} can be matched.

Upvotes: 1

Views: 54

Answers (1)

Jake Harmon
Jake Harmon

Reputation: 36

As others have stated, you shouldn't regex XML. It's far easier and more robust to use XML parsers.

However, since late-night regex is so fun, here's a simple one that would work here:

String output = oldStr.replaceAll("(?m)<(\\w+)>\\s+([^<>]*)$\\s+</\\1>", "<$1>$2</$1>");

Again, don't use anything like that in prod code. There are plenty of edge-cases that would break almost any regex on XML.

Upvotes: 2

Related Questions