Reputation: 309
I have to extract phrases strings from a response data using Dart and I'm doing it well with this regex:
\B"[^"]*"\B
It matches phrases good but it excludes asian kanji characters (like japanese, chinese, korean, russian etc).
var regex = RegExp(r'\B"[^"]*"\B');
Iterable<Match> matches = regex.allMatches(returnString);
matches.forEach((match) {
t.add(match.group(0));
});
How can I make it match these kanjis alongside with the Ocidental characters too? Or if I need a new regex, can you help me to re-do it? Thank you and sorry my lack of knowlegde & bad english.
Upvotes: 3
Views: 1825
Reputation: 71828
The RegExp \B"[^"]*"\B
relies on the \B
escape - a "non word-boundary" zero-width match which matches only if one of the surrounding characters is a "word character" (ASCII a
-z
, A
-Z
, 0
-9
, $
or _
) and the other is not. Since "
is not, it matches only when you have a word character followed by a quote, and matches only if the next quote is followed by a word character. It should match any non-quote character between those two quotes, no matter what script it is in. The non-boundary assertions are ASCII only, though, so I'm guessing those are the ones causing you issues.
It's not clear from this alone exactly what it is you want to achieve. Can you describe the strings that you want to match, and some examples of strings that you don't want to match?
Upvotes: 0
Reputation: 76333
To match all non-ascii chars you can use RegExp(r'[^\x00-\x7F]')
Upvotes: 4