Lazer
Lazer

Reputation: 94820

Removing non-displaying characters from a file

$ cat weirdo 
Lunch now?
$ cat weirdo | grep Lunch
$ vi weirdo
  ^@L^@u^@n^@c^@h^@ ^@n^@o^@w^@?^@

I have some files that contain text with some non-printing characters like ^@ which cause my greps to fail (as above).

How can I get my grep work? Is there some way that does not require altering the files?

Upvotes: 3

Views: 5158

Answers (4)

user123444555621
user123444555621

Reputation: 152976

The tr command is made for that:

cat weirdo | tr -cd '[:print:]\r\n\t' | grep Lunch

Upvotes: 5

Jonathan Leffler
Jonathan Leffler

Reputation: 753695

It looks like your file is encoded in UTF-16 rather than an 8-bit character set. The '^@' is a notation for ASCII NUL '\0', which usually spoils string matching.

One technique for loss-less handling of this would be to use a filter to convert UTF-16 to UTF-8, and then using grep on the output - hypothetically, if the command was 'utf16-utf8', you'd write:

utf16-utf8 weirdo | grep Lunch

As an appallingly crude approximation to 'utf16-utf8', you could consider:

tr -d '\0' < weirdo | grep Lunch

This deletes ASCII NUL characters from the input file and lets grep operate on the 'cleaned up' output. In theory, it might give you false positives; in practice, it probably won't.

Upvotes: 6

ghostdog74
ghostdog74

Reputation: 342363

you can try

awk '{gsub(/[^[:print:]]/,"") }1' file 

Upvotes: 2

DarkDust
DarkDust

Reputation: 92335

You may have some success with the strings(1) tool like in:

strings file | grep Launch

See man strings for more details.

Upvotes: 2

Related Questions