Reputation: 4544
I want to break a String in Java using numbers as delimiters, but keep the numbers. A bit of research has shown me that using the split method() from String would be appropriate, but I failed to understand how to do so. To further explain my question I'll use an example:
Input: 20.55|50|0.5|20|20.55
Required Output: ["20.55","|","50","|","0.5","|","20","|","20.55"]
By invoking the method split like the example I present below, without lookahead and lookbehind, I get the output I was expecting
expression.split("([0-9]+(\\.[0-9]+)?)")
Output: ["|","|","|","|"]
But if I try to do that with lookahead:
expression.split("(?=([0-9]+(\\.[0-9]+)?))")
Output: ["2","0.","5","5|","5","0|","0.","5|","2","0|","2","0.","5","5"]
And by using lookbehind I get an exception:
Exception in thread "main" java.util.regex.PatternSyntaxException: Look-behind group does not have an obvious maximum length near index 22 (?<=([0-9]+(.[0-9]+)?))
Can anyone explain me this behaviour, and suggest a solution?
PS: I know I can use the '|' to break the string, but this is just a silly example, I actually need a much more complex regex...
EDIT:
Seems to be impossible to do what I want because of the length of the delimiters. Since I was looking for a solution to a smaller problem, which I could then use to the remaining of the exercise, I will rephrase to see if there's a turnaround, like the one found in the second and third answers:
I want to break a String in Java containing an arithmetic expression, and keep all its items. For example:
Input: 20.55 * 0.5 ** cos(360) + sin 0 * cos 90 + 1 * sin (180 + 90) * 0
Output: ["20.55", "*", "0.5", "**", "cos", "(", "360", ")", "+", "sin", "0", "*", "cos", "90", "+", "1", "*", "sin", "(", "180", "+", "90", ")", "*", "0"]
PSS: please note that I have to use '**' for the exponentiation.
EDIT 2 Following the answer given by anubhava, a solution was found to break an arithmetic expression on all its items
Pattern p = Pattern.compile( "\\*\\*|sin|cos|tan|\\d+(?:\\.\\d+)?|[-()+*/%]" );
Matcher matcher = p.matcher(expression);
while(matcher.find())
System.out.println(matcher.group());
Upvotes: 3
Views: 1183
Reputation: 784958
You can use this lookaround based regex for splitting:
String[] toks = "20.55|50|0.5|20|20.55".split( "(?=[^\\d.])|(?<=[^\\d.])" );
for (String tok: toks)
System.out.printf("%s%n", tok);
Update:
You can use this regex for matching your tokens:
Pattern p = Pattern.compile( "sin|cos|tan|\\d+(?:\\.\d+)?|[-()+*/%]" );
You can then use Matcher#find()
method in a while loop to get all the matched tokens.
Upvotes: 2
Reputation: 5395
Try with:
(?<=\d)(?=\|)|(?<=\|)(?=\d)
In Java:
public class RegexTest{
public static void main(String[] args){
String string = "20.55|50|0.5|20|20.55";
System.out.println(Arrays.toString(string.split("(?<=\\d)(?=\\|)|(?<=\\|)(?=\\d)")));
}
}
with result:
[20.55, |, 50, |, 0.5, |, 20, |, 20.55]
EDIT
To use other characters as delimeters to include "*", "sin" ,etc., you can change regex to:
(?<=[0-9a-z*])(?=\|)|(?<=\|)(?=[0-9a-z*])
where [0-9a-z*]
means digit, letter or "*". If you want to include other characters, just add it to character class, like [0-9a-z*E]
, etc.
Upvotes: 1
Reputation: 36101
The problem is that you can't define lookbehinds with variable length. +
, *
and ?
all match a variable amount of characters. This is a limitation of most regex engines.
You can have lookaheads with variable length however. But in your case, this wont do the job, because lookarounds don't consume already matched data.
You want something that does:
([0-9]+(\\.[0-9]+)?)\\K
What \K
does is just throw away what was already matched. Therefore, you will still split by certain positions and won't repeat yourself with the floating numbers.
Upvotes: 1