Oleg Cherednik
Oleg Cherednik

Reputation: 18245

Check string for regular expression and select a group

I have a problem with checking the string and pattern in Java.

I want to check if the string contains at least one word (and check most of them) from the list e.g. [January, February, March] in any order and group the part of the string before the next same pattern (repeated this again and again until the end of the string).

// months = { "January", "December" } - to simplify, we have only two months
String str = "January I have a problem December There is no more sun";

In the example above I would finally select the following groups:

I can't figure out how to define a pattern to check until the next the same pattern occurs in the string. This is my last (not working) solution:

Pattern pattern = Pattern.compile("[\\\bJanuary\\\b|\\\bDecember\\\b]\\s+(.+)");
Matcher matcher = pattern.matcher(str);

while(matcher.find()) {
    System.out.println(matcher.group());
}

// should print:
// I have a problem
// There is no more sun

Upvotes: 2

Views: 138

Answers (3)

Anuj
Anuj

Reputation: 1665

Concept

You can use look around to solve your problem. Here is the concept:

1: Look behind to check if there is any month 
2: Match any character lazily
3: Look ahead to check if there is any month or end of line

We construct a string joinedMonths of the form:

months = {"January", "December"};
joinedMonths = "(January|December|$)";

Using this we can use the regex:

(?<=joinedMonths).+?(?=joinedMonths)

Please note I am using joinedMonths to refer to that string pattern for understanding purpose and not to the text 'joinedMonths'.


Code

String line = "January I have a problem December There is no more sun";
String[] months = {"January", "December"};

// Construct the joined months string
String joinedMonths = String.join("|", months);
joinedMonths += ("|$");

// Initialize the regex pattern
String regexPattern = "(?<=(" + joinedMonths + ")).+?(?=(" + joinedMonths +"))";

Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(line);

// Print all the matches
while (matcher.find()) System.out.println(matcher.group());

Output

I have a problem 
There is no more sun

Upvotes: 0

Nikolas
Nikolas

Reputation: 44398

You can use a Regex group to define a set of months to be used as a delimiter.

(January|February|March|...|...)
String[] months = { "January", "December" };
String monthsRegex = String.join("|", months);

Then amend the output using String#trim and the non-empty condition:

Arrays.stream(str.split("(" + months + ")"))
       .map(String::trim)
       .filter(part -> part.length() > 0)
       .forEach(System.out::println);
I have a problem
There is no more sun

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521379

One approach would be to form a regex alternation using the input months, and then use an appropriate regex pattern to find the phrases you want.

List<String> months = Arrays.asList(new String[] {"January", "December"});
String regexAlt = "\\b(?:" + String.join("|", months) + ")\\b";
Pattern pattern = Pattern.compile(regexAlt + "\\s+(.*?)(?=" + regexAlt + "|$)");
Matcher matcher = pattern.matcher("January I have a problem December There is no more sun");
while (matcher.find()) {
    System.out.println("MATCH: " + matcher.group(1));
}

This prints:

MATCH: I have a problem 
MATCH: There is no more sun

For an explanation, here is the full regex pattern we use above:

\b(?:January|December)\s+(.*?)(?=(?:January|December)|$)

This captures the content in between one month, up until reaching either another month of interest or the end of the input.

Upvotes: 1

Related Questions