Reputation: 726
What is the regex to match any Chinese character in R?
[\\p{Han}]
doesn't seem to work as expected.
v=c("a","b","c","中","e","文")
grep("[\\p{Han}]",v, value = TRUE)
[1] "a"
Upvotes: 4
Views: 1684
Reputation:
According to regular-expressions.info, "The JGsoft engine, Perl, PCRE, PHP, Ruby 1.9, Delphi, and XRegExp can match Unicode scripts". So setting perl = T
should produce the correct results. The R default is a modified version of Ville Laurikari's TRE engine (source):
grep("[\\p{Han}]", v, value = T, perl = T)
#### OUTPUT ####
[1] "中" "文"
Upvotes: 3