Shreyas Athreya
Shreyas Athreya

Reputation: 113

Read filenames with embedded whitespace into an array in a shell script

Basically I'm searching for a multi-word file which is present in many directories using find command and the output is stored on to a variable vari

    vari = `find -name "multi word file.xml"

When I try to delete the file using a for loop to iterate through.,

    for file in ${vari[@]}

the execution fails saying.,

    rm: cannot remove `/abc/xyz/multi':: No such file or directory

Could you guys please help me with this scenario??

Upvotes: 4

Views: 2547

Answers (3)

Jay jargot
Jay jargot

Reputation: 2868

The solutions are all line-based solutions. There is a test environment at bottom for which there is no known solution.

As already written, the file could be removed with this tested command:

$ find . -name "multi word file".xml -exec rm {} +

I did not manage to use rm command with a variable when the path or filename contains \n.

Test environment:

$ mkdir "$(printf "\1\2\3\4\5\6\7\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\40\41\42\43\44\45\46\47testdir" "")"
$ touch "multi word file".xml
$ mv *xml *testdir/
$ touch "2nd multi word file".xml ; mv *xml *testdir
$ ls -b
\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\ !"#$%&'testdir
$ ls -b *testdir
2nd\ multi\ word\ file.xml  multi\ word\ file.xml

Upvotes: 0

Cole Tierney
Cole Tierney

Reputation: 10314

Here are a few approaches:

# change the input field separator to a newline to ignore spaces
IFS=$'\n'
for file in $(find . -name '* *.xml'); do
    ls "$file"
done

# pipe find result lines to a while loop
IFS=
find . -name '* *.xml' | while read -r file; do
    ls "$file"
done

# feed the while loop with process substitution
IFS=
while read -r file; do
    ls "$file"
done < <(find . -name '* *.xml')

When you're satisfied with the results, replace ls with rm.

Upvotes: 2

mklement0
mklement0

Reputation: 438028

  • If you really need to capture all file paths in an array up front (assumes bash, primarily due to use of arrays and process substitution (<(...))[1]; a POSIX-compliant solution would be more cumbersome[2]; also note that this is a line-based solution, so it won't handle filenames with embedded newlines correctly, but that's very rare in practice):
# Read matches into array `vari` - safely: no word splitting, no
# globbing. The only caveat is that filenames with *embedded* newlines
# won't be handled correctly, but that's rarely a concern.
# bash 4+:
readarray -t vari < <(find . -name "multi word file.xml")
# bash 3:
IFS=$'\n' read -r -d '' -a vari < <(find . -name "multi word file.xml")

# Invoke `rm` with all array elements:
rm "${vari[@]}"  # !! The double quotes are crucial.
  • Otherwise, let find perform the deletion directly (these solutions also handle filenames with embedded newlines correctly):
find . -name "multi word file.xml" -delete

# If your `find` implementation doesn't support `-delete`:
find . -name "multi word file.xml" -exec rm {} +

As for what you tried:

  • vari=`find -name "multi word file.xml"` (I've removed the spaces around =, which would result in a syntax error) does not create an array; such a command substitution returns the stdout output from the enclosed command as a single string (with trailing newlines stripped).

    • By enclosing the command substitution in ( ... ), you could create an array:
      vari=( `find -name "multi word file.xml"` ),
      but that would perform word splitting on the find's output and not properly preserve filenames with spaces.
    • While this could be addressed with IFS=$'\n' so as to only split at line boundaries, the resulting tokens are still subject to pathname expansion (globbing), which can inadvertently alter the file paths.
    • While this could also be addressed with a shell option, you now have 2 settings you need to perform ahead of time and restore to their original value; thus, using readarray or read as demonstrated above is the simpler choice.
  • Even if you did manage to collect the file paths correctly in $vari as an array, referencing that array as ${vari[@]} - without double quotes - would break, because the resulting strings are again subject to word splitting, and also pathname expansion (globbing).

    • To safely expand an array to its elements without any interpretation of its elements, double-quote it: "${vari[@]}"

[1]

Process substitution rather than a pipeline is used so as to ensure that readarray / read is executed in the current shell rather than in a subshell.

As eckes points out in a comment, if you were to try find ... | IFS=$'\n' read ... instead, read would run in a subshell, which means that the variables it creates will disappear (go out of scope) when the command returns and cannot be used later.

[2]

The POSIX shell spec. supports neither arrays nor process substitution (nor readarray, nor any read options other than -r); you'd have to implement line-by-line processing as follows:

while IFS='
' read -r vari; do
  pv vari
done <<EOF
$(find . -name "multi word file.xml")
EOF

Note the require actual newline between IFS=' and ' in order to assign a newline, given that the $'\n' syntax is not available.

Upvotes: 6

Related Questions