Dark Knight
Dark Knight

Reputation: 307

Java Regex: Split based on non-word characters except for apostrophe

I'm trying to split and include based on spaces and non-word characters, except for apostrophes.

I've been able to make it split and include based on spaces and non-word characters, but I can't seem to figure out how to exclude apostrophes from the non-word characters.

This is my current Regex...

str.split("\\s|(?=\\W)");

...which when run on this code sample:

program p;
begin
    write('x');
end.

...produces this result:

program
p
;
begin

write
(
'x   <!-- This is the problem.
'
)
;
end
.

Which is almost correct, but my goal is to skip the apostrophes so that this is the result:

program
p
;
begin

write
(
'x'   <!-- This is the wanted result.
)
;
end
.

UPDATE

As suggested I've tried:

str.split("\\s|(?=\\W)(?<=\\W)");

Which almost works, but does not split all of the special characters correctly:

program
p;
begin
write(
'x'
)
;
end.

Upvotes: 4

Views: 2455

Answers (4)

Val Kalinichenko
Val Kalinichenko

Reputation: 351

As alternative it's possible to scan string for \b[\w']+\b

Upvotes: 0

Bohemian
Bohemian

Reputation: 425003

Treat the apostrophe separately and requiring a preceding non-word:

str.split("\\s+|(?=[^\\w'])|(?<=\\W)(?=')");

See live demo.

Upvotes: 0

wpcarro
wpcarro

Reputation: 1546

Have you tried...

[^\w']

This will match any character that is neither a word character nor an apostrophe. May be simple enough to work depending on your inputs.

If you run a replace operation using [^\w'] as your regex and \n\1\n as your replacement string, it should get you close to where you'd like to be.

Upvotes: 7

vks
vks

Reputation: 67968

You can split on this.

\s|('[^']*')|(?=\W)

See demo.

https://regex101.com/r/mL7eL6/1

Upvotes: 1

Related Questions