Reputation: 537
I'm new to regex, Sorry for my noobish question
My problem is i want to group the data on the String
What i want to get is the ff:
a-z
A-Z
or a-z
A-Z
0-9 (ex: abc, bzc15 but not 1abc or 14bc)0-9
(ex: 1,23,56 and etc)+
*
-
/
(
and )
I want to group them in an array and preserve their position if possible.
Ex:
String test = "a + b + 6";
The result should be something like this
Array[0] = a
Array[1] = White Space
Array[2] = +
Array[3] = White Space
Array[4] = b
Array[5] = White Space
Array[6] = +
Array[7] = White Space
Array[8] = 6
Is this possible? If yes, what pattern should i use?
Any help will be appriciated
Upvotes: 1
Views: 86
Reputation: 4878
I think this regex will do what you want:
"((?<=\\d)(?=\\p{Alpha}))|((?<=\\w)(?=\\W))|((?<=\\W)(?=\\w))|((?<=\\W)(?=\\W))"
It splits the String
at the following locations:
[0-9]
and before a letter [a-zA-Z]
.[a-zA-Z_0-9]
and a non-word character.Upvotes: 0
Reputation: 424993
Try this:
String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))");
I deduced that you want to split at the start, or the end, of whitespace. But the regex has to be zero-width, otherwise the whitespace would be consumed. This is achieved by using look behinds and look aheads, which are zero-width. The reflexes in the look arounds are:
\s
means "a whitespace character"\S
means "a non-whitespace character"Then there's the look arounds:
(?<=regex)
asserts that the preceding input matches regex
(?=regex)
asserts that the following input matches regex
Then there's the OR:
(regex1)|(regex2)
means "matches either regex1 or regex2"Upvotes: 1
Reputation: 4322
I am guessing here but I think you want to parse mathematical statements, or in other words you are trying to perform Lexical Analysis - (http://en.wikipedia.org/wiki/Lexical_analysis)
You might want to consider one of java fully developed lexical analysis / parsere generators for an easy solution, The only one that I have worked with is CUP http://www.cs.princeton.edu/~appel/modern/java/CUP/ and it is quite easy to use.
Other wise you will need to write some custom parser code.
String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))");
or char[] charArr = test.toCharArray();
are inapproriate here since the following are cases where you will have inproperly tokenized results
input Expected Result Result of bad solution
(2 + 4) [(,2,+,4,)] [(2,+,4)]
1+2 [1,+,2] [1+2]
2 + 14(5) [2,+,14,(,5,)] [2,+3,14(5)]
3a [3,a] [3a]
abs(5 + 6) [abs,(,5,+,6,)] [abs(5,+,6)]
*basicaly anywhere the input does not have an explicit space between token, which
should be allowed but the other suggested solutions do not support.
Upvotes: 0
Reputation: 12843
Try this:
char[] charArr = test.toCharArray();
Example:
public static void main(String[] args) {
String test = "a + b + 6";
char[] charArr = test.toCharArray();
System.out.println(Arrays.toString(charArr));
}
Output:
[a, , +, , b, , +, , 6]
Upvotes: 0