m3div0
m3div0

Reputation: 1586

Javascript regex for splitting by whitespace for accented chars

I am trying to split string in javascript by whitespaces, but ignoring whitespaces enclosed in quotes. So I googled this regular expression :(/\w+|"[^"]+"/g) but the problem is, that this isn't working with accented chars like á etc. So please how should I improve my regular expression to make it work?

Upvotes: 0

Views: 137

Answers (3)

Tim Goodman
Tim Goodman

Reputation: 23976

This matches non-spaces that don't contain quotes, and matches text between quotes:

/[^\s"]+|"[^"]+"/g

Upvotes: 1

Bergi
Bergi

Reputation: 664297

If you want to match all non-whitespace characters instead of only alphanumeric ones, replace \w with \S.

Upvotes: 0

João Silva
João Silva

Reputation: 91299

That's because \w only matches [A-Za-z0-9_]. To match accented characters, add the unicode block range \x81-\xFF which includes the Latin-1 characters à and ã, et cetera:

(/[\w\x81-\xFF]+|"[^"]+"/g)

There's also this site, which is very helpful to build the required unicode block range.

Upvotes: 1

Related Questions