Abraham Arnold
Abraham Arnold

Reputation: 365

Get text inside brackets along with splitting delimiters in regex java?

I have a multiline string which is delimited by a set of different delimiters,

A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H

I need to split that string by delimiters, but if some words are inside brackets then extract the bracket as a single word even if it contains a delimiter. I need them to be extract as follows,

A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H

Currently I am using this expression to split by delimiters,

(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))

I tried the following but it is not working. So how can I make this to work?

((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))

Java Code,

String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");

for (String word : texts) {
    System.out.println(word);
}

Upvotes: 3

Views: 277

Answers (1)

samabcde
samabcde

Reputation: 8114

IMO, Matching is easier than Splitting

Since the "delimiter" is also needed, I suggest to match the pattern we need instead. Base on the example given, we have below patterns to capture.

  1. (C DelimiterA D) - Bracket contain a word, delimiter and a word
    which is "\\(\\w+ (DelimiterA|DelimiterB) \\w+\\)"
  2. DelimiterB - Whole Delimiter.
    which is "(DelimiterA|DelimiterB)".
  3. B, B X - One or multiple words which are not delimiter.
    How to check the word is not delimiter?
    We can check the " " in between is not followed/preceded by delimiter(check Regex not operator), which is "\\w+((?<!(DelimiterA|DelimiterB))\\s(?!(DelimiterA|DelimiterB))\\w+)*".
import java.util.Scanner;

public class SplitWithCustomDelimiter {
    public static void main(String[] args) {
        String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
        // scanner can accept different source
        Scanner scanner = new Scanner(txt);
        scanner.findAll(
                "\\(\\w+ (DelimiterA|DelimiterB) \\w+\\)" +
                "|(DelimiterA|DelimiterB)" +
                "|\\w+((?<!(DelimiterA|DelimiterB))\\s(?!(DelimiterA|DelimiterB))\\w+)*"
                )
                .map(matchResult -> matchResult.group()).forEach(System.out::println);
    }
}

Upvotes: 1

Related Questions