Zi1mann
Zi1mann

Reputation: 344

Regex for splitting at every character but keeping numbers and decimals together

Ive asked quite a similar question before and implemented this regex: "(?<!^)(?=\\D)|(?<=\\D)" to split a String like this:

String input = "x^(24-3x)";
String[] signs = input.split("regex here");
for (int i = 0; i < signs.length; i++) {
System.out.println(signs[i]); }

with the output resulting in: "x", "^", "(", "24", "-", "3", "x", ")"

Now I need to split a String under the same conditions, but also keeping decimal numbers together. ATM, an input like (0.5) results in: "(", "0", ".", "5", ")" but I need the decimal number to stay grouped in one string like: "(", "0.5", ")". Thank you

Upvotes: 0

Views: 579

Answers (1)

tobias_k
tobias_k

Reputation: 82899

Instead of splitting in between the tokens, you could quite easily define a regex matching the several different tokens, for example something like [0-9]+|[a-z]+|[()^*/+-], i.e. one or more numbers or one more more letters or any single special character. In practice, this may require a bit more elaboration, e.g. to account for decimal numbers:

List<String> tokens = new ArrayList<>();
Pattern p = Pattern.compile("(\\d+(\\.\\d+)?)|[a-zA-Z]+|[()^*/+-]");
Matcher m = p.matcher("exp(42) * x^(24-3x) - 3.14");
while (m.find()) {
    tokens.add(m.group());
}

Result for tokens is [exp, (, 42, ), *, x, ^, (, 24, -, 3, x, ), -, 3.14]

Taking a closer look at the components of the regex:

  • (\\d+(\\.\\d+)?) Some digits, optionally followed by a dot and more digits. If you also want to allow numbers such as .1 or 42. you need to change this a bit.
  • [a-zA-Z]+ One or more letters; if you want to allow variables with underscores or digits, such as var_23, you might extend this to something like ([a-zA-Z_]\w+) (not tested)
  • [()^*/+-] A single special character, such as an operation or a bracket. Note that the - comes last so it is not interpreted as a range. If you also have multi-character operators, such as != or <=, you could change this to another disjunction: +|-|==|<=|...

Upvotes: 2

Related Questions