Reputation: 363
I'm trying to group 2 sub-sentences of whatever reasonable length separated by a specific word (in the example "AND"), where the second can be optional. Some example:
CASE1:
foo sentence A AND foo sentence B
shall give
"foo sentence A" --> matching group 1
"AND" --> matching group 2 (optionally)
"foo sentence B" --> matching group 3
CASE2:
foo sentence A
shall give
"foo sentence A" --> matching group 1
"" --> matching group 2 (optionally)
"" --> matching group 3
I tried the following regex
(.*) (AND (.*))?$
and it works but only if, in CASE2, i put an empty space at the final position of the string, otherwise the pattern doesn't match. If I include the space before "AND" inside round brackets group, in the case 1 the matcher includes the whole string in the first group. I wondered aroung lookahead and lookbehind assertions, but not sure they can help me. Any suggestion? Thanks
Upvotes: 1
Views: 728
Reputation: 91488
I'd use this regex:
^(.*?)(?: (AND) (.*))?$
explanation:
The regular expression:
(?-imsx:^(.*?)(?: (AND) (.*))?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
AND 'AND'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Upvotes: 2
Reputation: 15010
This regex will return the requested string parts into the requested groups. The and
is optional, if it's not found in the string then the entire string is placed into group 1. All the \s*?
forces the captured groups to have their white space trimmed automatically.
^\s*?\b(.*?)\b\s*?(?:\b(and)\b\s*?\b(.*?)\b\s*?)?$
0 gets the entire matching string
and
, if no and
then the entire string appears hereand
Case 1
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "foo sentence A AND foo sentence B";
Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
Matcher m = re.matcher(sourcestring);
if(m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
}
}
}
}
$matches Array:
(
[0] => foo sentence A AND foo sentence B
[1] => foo sentence A
[2] => AND
[3] => foo sentence B
)
Case 2, using the same regex
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "foo sentence A";
Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
Matcher m = re.matcher(sourcestring);
if(m.find()){
for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
}
}
}
}
$matches Array:
(
[0] => foo sentence A
[1] => foo sentence A
)
Upvotes: 2
Reputation: 425208
Change your regex to make the space after he first sentence optional:
(.*\\S) ?(AND (.*))?$
Or you could use split()
to consume the AND
and any surrounding spaces:
String sentences = sentence.spli("\\s*AND\\s*");
Upvotes: 0
Reputation: 195199
your case 2 is a little strange...
but I would do
String[] parts = sentence.split("(?<=AND)|(?=AND)"));
you check the parts.length
. if length==1, then it is case2. you just have the sentence in array, you could add empty string as your "group2/3"
if in case1 you have directly parts
:
[foo sentence A , AND, foo sentence B]
Upvotes: 0
Reputation: 7507
How about just using
String split[] = sentence.split("AND");
That will split the sentence up by your word and give you a list of subparts.
Upvotes: 2