Reputation: 21179
I have a problem where I need to tokenize my string (using java code) so that it is split into a String array. Each token in this array should be either a word, number or dimensions of the form (23 x 34 x 56, etc.) I tried to code this as:
String[] split_text = text.split("\\s | (\\d{3},)*\\d{3}([.]\\d)* x (\\d{3},)*\\d{3}([.]\\d)* | \\d*([.]\\d)* x \\d*([.]\\d)*");
But, this is giving a syntax error. Can anyone please tell me how I can do this using regular expressions, and whether there is a problem in the way I have expressed the regular expression in java?
Upvotes: 0
Views: 234
Reputation: 8560
To match any pair of numbers with dots or commas and an x in the middle you could do something like this:
(\d*(?:[.,]\d+)* x \d*(?:[.,]\d+)*)
or for pairs and triples:
(\d*(?:[.,]\d+)*(?: x \d*(?:[.,]\d+)*){1,2})
so maybe thats your expression:
((?:\d*(?:[.,]\d+)*(?: x \d*(?:[.,]\d+)*){1,2})|\s|\w+)
See here: http://rubular.com/r/snAiI7GMT7 - a great site for testing.
You might want to replace the \w
with \p{L}
to cover all unicode words in java.
Upvotes: 1
Reputation: 336378
I don't see a syntax error in your regex, but there are a few problems:
\.
to match a literal dot.(\d{3},)*\d{3}([.]\\d)*
will match 123,456,789.1.1.1.1
but not 1,234.67
. Is that really what you intended?<number> x <number>
will only match pairs of numbers, not triplets as in your example.I think it's best if you update your specifications a little. What exactly do you/don't you want to match. Give a few examples. Think of corner cases (is a leading zero allowed? can it be dropped as in .12
? How about 1.4E-45
and so on)...
Upvotes: 0
Reputation: 455272
The String.split returns an array of Strings.
Make split_text
an array:
String[] split_text = ...
^^
Upvotes: 0