Marcus Leon
Marcus Leon

Reputation: 56669

Grep regex doesn't work with Cygwin on Windows

I'm trying to find all non ascii chars in a file using grep:

grep '[^\x00-\x7F]' myfile

I think this should work but it returns each row in the file.

Any ideas?

Upvotes: 3

Views: 2107

Answers (3)

Keith Thompson
Keith Thompson

Reputation: 263257

grep doesn't recognize the \x syntax.

( echo Hello ; echo '\\x48' ) | grep '\x48'

prints

\x48

('H' is character 0x48.)

Your grep is matching all lines because each line contains a character other than \, x, 0, 7, F, and anything in the range 0 .. \.

Note that this is not specific to Cygwin.

GNU grep (which is what Cygwin has) has an experimental -P option that tells it to use Perl-like regular expressions; with that option, it does recognize the \x syntax.

Upvotes: 2

Cajunluke
Cajunluke

Reputation: 3113

Grep may be interpreting multibyte (i.e., non-ASCII) characters as several single-byte (ASCII) characters. (This way, this lovely character [U+2229] would show up as " [U+0022] followed by a ) [U+0029].) You'll need to figure out the file's encoding and use a more-sphisticated system that knows Unicode.

Upvotes: 1

Marcus Leon
Marcus Leon

Reputation: 56669

Found that perl works:

perl -n -e 'print if /[^\x00-\x7F]/' file

Upvotes: 1

Related Questions