Reputation: 6223
I need to check if some strings contain any non-English characters.
x = c('Kält', 'normal', 'normal with, punctuation ~-+!', 'normal with number 1234')
grep(pattern = ??, x) # Expected output:1
Upvotes: 4
Views: 1036
Reputation: 321
Expanding on the answer that's already been provided
To check for non-ASCII
x = c('Kält', 'normal', 'normal punctuation ~-+!', 'normal number 1234')
grep(pattern = "[^[:ascii:]]", x, perl=TRUE)
grep(pattern = "[^[:ascii:]]", x, value=TRUE, perl=TRUE)
To check for non-unicode
x = c('Kält', 'normal', 'normal punctuation ~-+!', 'normal number 1234')
grep(pattern = "[^\u0001-\u007F]+", x, perl=TRUE)
grep(pattern = "[^\u0001-\u007F]+", x, value=TRUE, perl=TRUE)
you can also use the stringi
package to determine if a string is ASCII
x = c('Kält', 'normal', 'normal punctuation ~-+!', 'normal number 1234')
stringi::stri_enc_isascii(x)
Upvotes: 1
Reputation: 627536
You may use [^[:ascii:]]
PCRE regex:
x = c('Kält', 'normal', 'normal with, punctuation ~-+!', 'normal with number 1234')
grep(pattern = "[^[:ascii:]]", x, perl=TRUE)
grep(pattern = "[^[:ascii:]]", x, value=TRUE, perl=TRUE)
Ouput:
[1] 1
[1] "Kält"
See the R demo
Upvotes: 5