newbie
newbie

Reputation: 537

Grouping using Regex in Java

I'm new to regex, Sorry for my noobish question
My problem is i want to group the data on the String

What i want to get is the ff:

  1. a-z A-Z or a-z A-Z 0-9 (ex: abc, bzc15 but not 1abc or 14bc)
  2. 0-9 (ex: 1,23,56 and etc)
  3. these operators + * - /
  4. the white space
  5. ( and )

I want to group them in an array and preserve their position if possible.

Ex:

String test = "a + b + 6";

The result should be something like this

Array[0] = a
Array[1] = White Space
Array[2] = +
Array[3] = White Space
Array[4] = b
Array[5] = White Space
Array[6] = +
Array[7] = White Space
Array[8] = 6

Is this possible? If yes, what pattern should i use?
Any help will be appriciated

Upvotes: 1

Views: 86

Answers (4)

Lone nebula
Lone nebula

Reputation: 4878

I think this regex will do what you want:

"((?<=\\d)(?=\\p{Alpha}))|((?<=\\w)(?=\\W))|((?<=\\W)(?=\\w))|((?<=\\W)(?=\\W))"

It splits the String at the following locations:

  • After a digit [0-9] and before a letter [a-zA-Z].
  • Between a word-character [a-zA-Z_0-9] and a non-word character.
  • Between two non-word characters.

Upvotes: 0

Bohemian
Bohemian

Reputation: 424993

Try this:

String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))");

I deduced that you want to split at the start, or the end, of whitespace. But the regex has to be zero-width, otherwise the whitespace would be consumed. This is achieved by using look behinds and look aheads, which are zero-width. The reflexes in the look arounds are:

  • \s means "a whitespace character"
  • \S means "a non-whitespace character"

Then there's the look arounds:

  • (?<=regex) asserts that the preceding input matches regex
  • (?=regex) asserts that the following input matches regex

Then there's the OR:

  • (regex1)|(regex2) means "matches either regex1 or regex2"

Upvotes: 1

gbtimmon
gbtimmon

Reputation: 4322

I am guessing here but I think you want to parse mathematical statements, or in other words you are trying to perform Lexical Analysis - (http://en.wikipedia.org/wiki/Lexical_analysis)

You might want to consider one of java fully developed lexical analysis / parsere generators for an easy solution, The only one that I have worked with is CUP http://www.cs.princeton.edu/~appel/modern/java/CUP/ and it is quite easy to use.

Other wise you will need to write some custom parser code.

String[] array = test.split("((?<=\\S)(?=\\s))|((?<=\\s)(?=\\S))"); or char[] charArr = test.toCharArray(); are inapproriate here since the following are cases where you will have inproperly tokenized results

input       Expected Result     Result of bad solution
(2 + 4)     [(,2,+,4,)]         [(2,+,4)]
1+2         [1,+,2]             [1+2]
2 + 14(5)   [2,+,14,(,5,)]      [2,+3,14(5)]
3a          [3,a]               [3a]
abs(5 + 6)  [abs,(,5,+,6,)]     [abs(5,+,6)]

*basicaly anywhere the input does not have an explicit space between token, which    
should be allowed but the other suggested solutions do not support. 

Upvotes: 0

Achintya Jha
Achintya Jha

Reputation: 12843

Try this:

char[] charArr = test.toCharArray();

Example:

public static void main(String[] args) {
    String test = "a + b + 6";
    char[] charArr = test.toCharArray();
    System.out.println(Arrays.toString(charArr));
}

Output:

[a,  , +,  , b,  , +,  , 6]

Upvotes: 0

Related Questions