zvisofer
zvisofer

Reputation: 1368

Java Regex ReplaceAll with grouping

I want to surround all tokens in a text with tags in the following manner:

Input: " abc fg asd "

Output:" <token>abc</token> <token>fg</token> <token>asd</token> "

This is the code I tried so far:

String regex = "(\\s)([a-zA-Z]+)(\\s)";
String text = " abc fg      asd ";
text = text.replaceAll(regex, "$1<token>$2</token>$3");
System.out.println(text);

Output:" <token>abc</token> fg <token>asd</token> "

Note: for simplicity we can assume that the input starts and ends with whitespaces

Upvotes: 0

Views: 84

Answers (3)

Attila Nepar&#225;czki
Attila Nepar&#225;czki

Reputation: 476

                                 // meaning not a space, 1+ times
String result = input.replaceAll("([^\\s]+)", "<token>$1</token>");

this matches everything that isn't a space. Prolly the best fit for what you need. Also it's greedy meaning it will never leave out a character that it shouldn't ( it will never find the string "as" in the string "asd" when there is another character with which it matches)

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

If your tokens are only defined with a character class you don't need to describe what characters are around. So this should suffice since the regex engine walks from left to right and since the quantifier is greedy:

String regex = "[a-zA-Z]+";
text = text.replaceAll(regex, "<token>$0</token>");

Upvotes: 0

Toto
Toto

Reputation: 91415

Use lookaround:

String regex = "(?<=\\s)([a-zA-Z]+)(?=\\s)";
...
text = text.replaceAll(regex, "<token>$1</token>");

Upvotes: 2

Related Questions