Wakan Tanka
Wakan Tanka

Reputation: 8052

find's print0 option reimplementation in bash and awk (perl)

I have successfully written following function:

function print0(){
  stdin=$(cat);
  echo "$stdin" | awk 'BEGIN {ORS="\000";}; { print $0}';
}

which works as a -print0 argument in find command, but basically for any command that passes it's output to this function. It is useful with xargs -0. Then I realized that also opposite of this function would be useful too. I have tried following:

function read0(){
  stdin=$(cat);
  echo "$stdin" | awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}';

  # EQUIVALENTS:
  # echo "$stdin" | perl -nle '@a=join("\n", split(/\000/, $_)); print "@a"'
  # echo "$stdin" | perl -nle '$\="\n"; @a=split(/\000/, $_); foreach (@a){print $_;}'
}

But it does not works, the interesting is that when I tried just commands (awk or perl) it worked like a charm:

# WORKING
ls | print0 | awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}'
ls | print0 | perl -nle '@a=join("\n", split(/\000/, $_)); print "@a"'
ls | print0 | perl -nle '$\="\n"; @a=split(/\000/, $_); foreach (@a){print $_;}'


# DOES NOT WORKING
ls | print0 | read0

What I am doing wrong? I am assuming that something is wrong with dealing null characters via following command: stdin=$(cat);

EDIT: Thank you all, the conclusion is that bash variables cannot hold null value. PS: mentioned command was just as example I know converting nulls to newlines and vice versa has not rational reason.

Upvotes: 0

Views: 1299

Answers (2)

user2719058
user2719058

Reputation: 2233

I would say that your implementation can be simplified as

function print0 { tr '\n' '\0'; }
function read0  { tr '\0' '\n'; }

which works as you want.

But, it adds no value; you just switch from new-line separated records to NUL separated records and vice-versa, while find ... -print0 can handle multi-line filenames. Your idea doesn't solve that problem.

The practical view of your question - how can strings with embedded NUL characters be handled in bash - has been discussed on SO: assign string containing null-character (\0) to a variable in bash. The bottom line is, you have to escape them. Other than that, zsh supports embedded NUL characters, but apparently no other shell does.

There has been a related discussion on bug-bash about the handling of NUL characters by the read shell builtin, which you may find interesting.

Upvotes: 3

Digital Trauma
Digital Trauma

Reputation: 16016

As the other answers/comments mention, you can't put a null character in a bash string variable. However if you can get rid of the variables and just handle the data in pipes/streams, then you can pass null characters through just fine:

function print0() {
  awk 'BEGIN {ORS="\000";}; {print $0}';
}

function read0() {
  awk 'BEGIN {RS="\000"; ORS="\n";};  {print $0}';
}
ubuntu@ubuntu:~/dir$ ls -1
file one
file_two
ubuntu@ubuntu:~/dir$ ls | print0 | read0
file one
file_two
ubuntu@ubuntu:~/dir$ 

Also using ls in this way is dangerous, because it won't work for filenames that contain newlines. As far as I'm aware, find is the way to programmatically get a list of files in a directory, when odd characters appear in filenames.


Update:

Here's another way to programmatically get a list of files in a directory, when odd characters appear in filenames, without using find (or the flawed ls). We can use a * glob to get the list of all files in the directory into a bash array. Then we print out each member of the array, using 1 character of /dev/zero as a delimiter:

#!/bin/bash

shopt -s nullglob
shopt -s dotglob    # display .files as well

dirarray=( * )

for ((i = 0 ; i < ${#dirarray[@]}; i++)); do
    [ "$i" != "0" ] && head -c1 /dev/zero
    printf "${dirarray[$i]}"
done

Upvotes: 1

Related Questions