xXx_CodeMonkey_xXx
xXx_CodeMonkey_xXx

Reputation: 810

Regex. Find all words with non latin characters

How can I find all words with at least one non latin letter (arabic, chinese...) in them using regex.h library?

cityدبي

Upvotes: 0

Views: 4944

Answers (3)

Toto
Toto

Reputation: 91438

How about:

(?=\pL)(?![a-zA-Z])

This will match a letter in any alphabet that is not a latin letter:

not ok - cityدبي
ok - city
not ok - دبي

Upvotes: 2

DhruvPathak
DhruvPathak

Reputation: 43245

Try this :

[a-zA-Z]*[^A-Za-z \d]+[a-zA-Z]*

Means : One or more non latin letter preceded or followed by one or more latin letter i.e. a word containing atleast 1 non latin character. See demo with some random text: http://regexr.com?326s3

You may need to adjust this regex to your needs,and include things like digits,special characters,word boundaries as per your input.

Upvotes: 0

frogwang
frogwang

Reputation: 64

just use [^a-zA-Z] if not match, it should contain an international character...

Upvotes: -1

Related Questions