Reputation: 18245
I have a problem with checking the string and pattern in Java
.
I want to check if the string contains at least one word (and check most of them) from the list e.g. [January, February, March]
in any order
and group the part of the string before the next same pattern (repeated this again and again until the end of the string).
// months = { "January", "December" } - to simplify, we have only two months
String str = "January I have a problem December There is no more sun";
In the example above I would finally select the following groups:
I have a problem
(starts with January
)There is no more sun
(starts with December
)I can't figure out how to define a pattern to check until the next the same pattern occurs in the string. This is my last (not working) solution:
Pattern pattern = Pattern.compile("[\\\bJanuary\\\b|\\\bDecember\\\b]\\s+(.+)");
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
System.out.println(matcher.group());
}
// should print:
// I have a problem
// There is no more sun
Upvotes: 2
Views: 138
Reputation: 1665
You can use look around to solve your problem. Here is the concept:
1: Look behind to check if there is any month
2: Match any character lazily
3: Look ahead to check if there is any month or end of line
We construct a string joinedMonths
of the form:
months = {"January", "December"};
joinedMonths = "(January|December|$)";
Using this we can use the regex:
(?<=joinedMonths).+?(?=joinedMonths)
Please note I am using joinedMonths
to refer to that string pattern for understanding purpose and not to the text 'joinedMonths'.
String line = "January I have a problem December There is no more sun";
String[] months = {"January", "December"};
// Construct the joined months string
String joinedMonths = String.join("|", months);
joinedMonths += ("|$");
// Initialize the regex pattern
String regexPattern = "(?<=(" + joinedMonths + ")).+?(?=(" + joinedMonths +"))";
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(line);
// Print all the matches
while (matcher.find()) System.out.println(matcher.group());
I have a problem
There is no more sun
Upvotes: 0
Reputation: 44398
You can use a Regex group to define a set of months to be used as a delimiter.
(January|February|March|...|...)
String[] months = { "January", "December" };
String monthsRegex = String.join("|", months);
Then amend the output using String#trim
and the non-empty condition:
Arrays.stream(str.split("(" + months + ")"))
.map(String::trim)
.filter(part -> part.length() > 0)
.forEach(System.out::println);
I have a problem There is no more sun
Upvotes: 1
Reputation: 521379
One approach would be to form a regex alternation using the input months, and then use an appropriate regex pattern to find the phrases you want.
List<String> months = Arrays.asList(new String[] {"January", "December"});
String regexAlt = "\\b(?:" + String.join("|", months) + ")\\b";
Pattern pattern = Pattern.compile(regexAlt + "\\s+(.*?)(?=" + regexAlt + "|$)");
Matcher matcher = pattern.matcher("January I have a problem December There is no more sun");
while (matcher.find()) {
System.out.println("MATCH: " + matcher.group(1));
}
This prints:
MATCH: I have a problem
MATCH: There is no more sun
For an explanation, here is the full regex pattern we use above:
\b(?:January|December)\s+(.*?)(?=(?:January|December)|$)
This captures the content in between one month, up until reaching either another month of interest or the end of the input.
Upvotes: 1