forthrin
forthrin

Reputation: 2777

Regexp matching all unicode characters except alphabetic characters

How do I grep a UTF-8 text file for lines containing any character outside ASCII, except a select few characters, eg. [æÆøØåÅ]?

So the following three lines:

ABC
ÆØÅ
ABC-ÆØÅ 😃

Should yield:

ABC-ÆØÅ 😃

Because the smiley is outside ASCII and does not belong to the extra ignored characters.

Upvotes: 1

Views: 404

Answers (2)

forthrin
forthrin

Reputation: 2777

GNU grep seems to support UTF-8. The following solves the problem on OS X.

brew install homebrew/dupes/grep
ggrep -P '[^\x00-\x7FæÆøØåÅ]' *.txt

Upvotes: 0

nwellnhof
nwellnhof

Reputation: 33618

grep doesn't support UTF-8. Try Perl:

perl -CSD -Mutf8 -ne 'print if /[^\x00-\x7FæÆøØåÅ]/' [FILE...]

-CSD enables UTF-8 IO. -Mutf8 enables UTF-8 in source code.

Upvotes: 1

Related Questions