Matteo
Matteo

Reputation: 346

Find files with non-printing characters (null bytes)

I have got the log of my application with a field that contains strange characters. I see these characters only when I use less command.

I tried to copy the result of my line of code in a text file and what I see is

CTP_OUT=^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

I'd like to know if there is a way to find these null characters. I have tried with a grep command but it didn't show anything

Upvotes: 7

Views: 10740

Answers (2)

kvantour
kvantour

Reputation: 26481

I hardly believe it, I might write an answer involving cat!

The characters you are observing are non-printable characters which are often written in Caret notation. The Caret notation of a character is a way to visualize non-printable characters. As mentioned in the OP, ^@ is the representation of NULL.

If your file has non-printable characters, you can visualize them using cat -vET:

-E, --show-ends: display $ at end of each line
-T, --show-tabs: display TAB characters as ^I
-v, --show-nonprinting: use ^ and M- notation, except for LFD and TAB

source: man cat

I've added the -E and -T flag to it, to convert everything non-printable.

As grep will not output the non-printable characters itself in any form, you have to pipe its output to cat to see them. The following example shows all lines containing non-printable characters

Show all lines with non-printable characters:

$ grep -E '[^[:print:]]' --color=never file | cat -vET

Here, the ERE [^[:print:]] selects all non-printable characters.

Show all lines with NULL:

$ grep -Pa '\x00' --color=never file | cat -vET

Be aware that we need to make use of the Perl regular expressions here as they understand the hexadecimal and octal notation.

Various control characters can be written in C language style: \n matches a newline, \t a tab, \r a carriage return, \f a form feed, etc.

More generally, \nnn, where nnn is a string of three octal digits, matches the character whose native code point is nnn. You can easily run into trouble if you don't have exactly three digits. So always use three, or since Perl 5.14, you can use \o{...} to specify any number of octal digits.

Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.

source: Perl 5 version 26.1 documentation

An example:

$ printf 'foo\012\011\011bar\014\010\012foobar\012\011\000\013\000car\012\011\011\011\012' > test.txt
$ cat test.txt
foo
                bar
                   
foobar
    
        car

If we now use grep alone, we get the following:

$ grep -Pa '\x00' --color=never test.txt
        
        car

But piping it to cat allows us to visualize the control characters:

$ grep -Pa '\x00' --color=never test.txt | cat -vET
^I^@^K^@car$

Why --color=never: If your grep is tuned to have --color=auto or --color=always it will add extra control characters to be interpreted as color for the terminal. And this might confuse you by the content.

$ grep -Pa '\x00' --color=always test.txt | cat -vET
^I^[[01;31m^[[K^@^[[m^[[K^K^[[01;31m^[[K^@^[[m^[[Kcar$

Upvotes: 11

Paul Hodges
Paul Hodges

Reputation: 15273

sed can.

 sed -n '/\x0/ { s/\x0/<NUL>/g; p}' file

-n skips printing any output unless explicitly requested.
/\x0/ selects for only lines with null bytes.
{...} encapsulates multiple commands, so that they can be collectively applied always and only when the /\x0/ has detected a null on the line.
s/\x0/<NUL>/g; substitutes in a new, visible value for the null bytes. You could make it whatever you want - I used <NUL> as something both reasonably obvious and yet unlikely to occur otherwise. You should probably grep the file for it first to be sure the pattern doesn't exist before using it.
p; causes lines that have been edited (because they had a null byte) to show.

This basically makes sed an effective grep for nulls.

Upvotes: 7

Related Questions