Gabriel Pacheco
Gabriel Pacheco

Reputation: 309

How to match Asian characters using regex?

I have to extract phrases strings from a response data using Dart and I'm doing it well with this regex:

\B"[^"]*"\B

It matches phrases good but it excludes asian kanji characters (like japanese, chinese, korean, russian etc).

var regex = RegExp(r'\B"[^"]*"\B');
      Iterable<Match> matches = regex.allMatches(returnString);
      matches.forEach((match) {
        t.add(match.group(0));
      });

How can I make it match these kanjis alongside with the Ocidental characters too? Or if I need a new regex, can you help me to re-do it? Thank you and sorry my lack of knowlegde & bad english.

Upvotes: 3

Views: 1825

Answers (2)

lrn
lrn

Reputation: 71828

The RegExp \B"[^"]*"\B relies on the \B escape - a "non word-boundary" zero-width match which matches only if one of the surrounding characters is a "word character" (ASCII a-z, A-Z, 0-9, $ or _) and the other is not. Since " is not, it matches only when you have a word character followed by a quote, and matches only if the next quote is followed by a word character. It should match any non-quote character between those two quotes, no matter what script it is in. The non-boundary assertions are ASCII only, though, so I'm guessing those are the ones causing you issues.

It's not clear from this alone exactly what it is you want to achieve. Can you describe the strings that you want to match, and some examples of strings that you don't want to match?

Upvotes: 0

Alexandre Ardhuin
Alexandre Ardhuin

Reputation: 76333

To match all non-ascii chars you can use RegExp(r'[^\x00-\x7F]')

Upvotes: 4

Related Questions