pfmasse60
pfmasse60

Reputation: 1

Assign output from find without word splitting

While running the bash command

myarray="(`find -type d -printf '%d\t%P\n' | cut -f2`)" 

on my present working directory, and then output the contents of myarray,

tLen=${#myarray[@]}

for (( i=0; i<${tLen}; i++ ))
do
        echo "${myarray[$i]}"
done

directory names with white space get split. i.e. The white spaces in the directory name 'My tax documents' aren't automatically escaped and ends up becoming three entries in the array, 'My' 'tax' 'documents' rather then just one name. However running

find -type d -printf '%d\t%P\n' | cut -f2 

from the command line works just fine. How do I prevent word splitting when assigning the output of find into an array?

Upvotes: 0

Views: 90

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295687

On Doing It Right

You can't safely use a newline as the trailing delimiter after an arbitrary filename: Filenames can contain newlines.

The below uses an unambiguous delimiter, and a read mechanism that works correctly with all possible filenames:

myarray=( )
while IFS= read -r -d $'\t' depth && IFS= read -r -d '' filename; do
  printf 'Found filename %q at depth %d\n' "$filename" "$depth" >&2
  myarray+=( "$filename" )
done < <(find . -type d -printf '%d\t%P\0')

# and to demonstrate reading from the array:
echo "Reiterating that list of filenames:" >&2
printf -- '- %q\n' "${myarray[@]}"

Note that we're calling read twice -- once to read up to the first tab after the depth, and one to read to the following NUL. One could get almost this effect with IFS=$'\t' read -r -d '' depth filename, but leading and trailing tabs in filenames could get lost.


References:


On What Went Wrong

  • find -type d -printf '%d\t%P\n' | cut -f2 doesn't create a correct list of filenames in the first place. Try creating a file with touch $'foo\tbar\nbaz\tqux' to have a particularly fun time here (the literal newline in the filename will be emitted by the %P format specifier, causing baz to be in the position otherwise containing the depth integer, and qux to show up as part of what looks like a completely separate filename.
  • By default, spaces and tabs are both part of IFS, and thus are both used for string-splitting.
  • The syntax

    foo="(`...`)"
    

    ...does not actually create an array at all; it creates a string which starts with ( as its first character and ends with ).

  • String splitting runs glob expansion in conjunction, so if you have a file named touch *, that would be replaced with a list of files in the current directory (thus causing other names to be represented twice).

Upvotes: 3

Related Questions