kakopappa
kakopappa

Reputation: 5085

javascript Regex unicode help

in a JavaScript, i am using Regex to split(/\W+/) to words.

when i split this, it's returning wrong value

var s3 = "bardzo dziękuję";
s3 = s3.split(/\W+/);


[0]: "bardzo"
[1]: "dzi"
[2]: "kuj"

How to fix this problem? please advice

Upvotes: 1

Views: 322

Answers (3)

jwl
jwl

Reputation: 10514

You could use CharFunk https://raw.github.com/joelarson4/CharFunk , which handles Unicode fully.

var s3 = "bardzo dziękuję";

function notLetterOrDigit(ch) {
    return !CharFunk.isLetterOrDigit(ch);
}

CharFunk.splitOnMatches(s3, notLetterOrDigit);

Upvotes: 1

Paul Alan Taylor
Paul Alan Taylor

Reputation: 10680

The regex isn't splitting because it is treating your accented characters as non-word characters.

Use the whitespace special character:-

s3 = s3.split(/\s+/);

Upvotes: 1

Matt
Matt

Reputation: 44058

In this case, why not just split with whitespace?

s3.split(/\s+/);

Upvotes: 1

Related Questions