Zac
Zac

Reputation: 3285

Regex replace using a repetitive capture

I have a table like:

A | 1  
A | 2  
B | 1  
B | 2  
B | 3

I'm trying to transform it to look like this:

A { 1 | 2 }  
B { 1 | 2 | 3 }

I've come up with this which will match correctly I just can't figure out how to get the repeated capture out.

(A|B)|(\d)(\r\n\1|(\d))*

UPDATE

I realize that this would be fairly trivial with some programming language, I was hoping to learn something more about regular expressions.

Upvotes: 2

Views: 376

Answers (1)

polygenelubricants
polygenelubricants

Reputation: 383746

This is a Java code that perhaps may be helpful:

    String text =   "A | 1\n" +
                    "A | 2\n" +  
                    "B | 1\n" +
                    "B | 2\n" +
                    "B | 3\n" +
                    "A | x\n" +
                    "D | y\n" +
                    "D | z\n";
    String[] sections = text.split("(?<=(.) . .)\n(?!\\1)");
    StringBuilder sb = new StringBuilder();
    for (String section : sections) {
        sb.append(section.substring(0, 1) + " {")
          .append(section.substring(3).replaceAll("\n.", ""))
          .append(" }\n");
    }
    System.out.println(sb.toString());

This prints:

A { 1 | 2 }
B { 1 | 2 | 3 }
A { x }
D { y | z }

The idea is to to do this in two steps:

  • First, split into sections
  • Then transform each section

A single replaceAll variant

If you intersperse { and } in the input to be captured so they can be rearranged in the output, this is possible with a single replaceAll (i.e. an entirely regex solution)

String text =   "{ A | 1 }" +
                "{ A | 2 }" +
                "{ B | 1 }" + 
                "{ B | 2 }" +
                "{ B | 3 }" +
                "{ C | 4 }" +
                "{ D | 5 }";
System.out.println(
    text.replaceAll("(?=\\{ (.))(?<!(?=\\1).{7})(\\{)( )(.) .|(?=\\}. (.))(?:(?<=(?=\\5).{6}).{5}|(?<=(.))(.))", "$4$3$2$7$6")
);

This prints (see output on ideone.org):

A { 1 | 2 } B { 1 | 2 | 3 } C { 4 } D { 5 }

Unfortunately no, I don't think this is worth explaining. It's way too complicated for what's being accomplished. Essentially, though, lots of assertions, nested assertions, and capture groups (some of which will be empty strings depending on which assertion passes).

This is, without a doubt, the most complicated regex I've written.

Upvotes: 1

Related Questions