Sati
Sati

Reputation: 726

Regex to match any chinese character in R

What is the regex to match any Chinese character in R?

[\\p{Han}] doesn't seem to work as expected.

v=c("a","b","c","中","e","文")
grep("[\\p{Han}]",v, value = TRUE)

[1] "a"

Upvotes: 4

Views: 1684

Answers (1)

user10191355
user10191355

Reputation:

According to regular-expressions.info, "The JGsoft engine, Perl, PCRE, PHP, Ruby 1.9, Delphi, and XRegExp can match Unicode scripts". So setting perl = T should produce the correct results. The R default is a modified version of Ville Laurikari's TRE engine (source):

grep("[\\p{Han}]", v, value = T, perl = T)

#### OUTPUT ####

[1] "中" "文"

Upvotes: 3

Related Questions