Ryan Schubert
Ryan Schubert

Reputation: 186

why is my shell script looping more than I want?

I'm writing a shell script for a pipeline i'm building. The code should just grab a list of 10 unique file identifiers from a directory and then begin doing some analysis on them. The code does begin with the 10 files I grab, but then continues to run on the entire directory! The code goes like this:

First basic user input

#!/bin/bash

Dir=$1 #needs directory containing the input files

Then grab the 10 unique identifiers into a list

if [ -e file_list.txt ] #remove any list at the start
then
    rm file_list.txt
fi 
for file in `ls ${Dir}* | cut -f 6 -d '/' | cut -f 1 -d '_' | uniq | head` #grab the first 10 unique files and put them in a text file
do
    echo $file >> file_list.txt #each file set has a unique tag, write that out to the list
done

Now go through the list of files and do stuff

while read file #now iterate through the list of files
do 
    #do stuff to file here
    ls ${file}* #list every file with this tag; just an example
done < file_list.txt 

I would like to say the culprit is the call to uniq when i grab the 10 file names. Previous versions of this code did not have this issue before i used uniq. but i dont see how unless this did something strange to my file_list.txt which looks fine to me.

Could the error be in when i'm working with the files in my third code block?

I used shellcheck and I got quite a few notes saying "Double quote to prevent globbing and word splitting."

Upvotes: 0

Views: 71

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295373

With respect to the "why" -- it's almost impossible to say without knowing your filenames. Any file with a literal * at the front of its name, for example, would be expanded into a list of every other file in the directory by your original code. Instead of tracking why broken code is broken, it's more sensible to just write something that follows best practices in the first place, so you don't need to dig into the winding pathways of how features that shouldn't be used from scripts at all can play off each other in creating unwanted messes.


As I read it, you want to assign each file a tag based on the content before the first _ at its name, and then take files with the first 10 unique tags.

We can do that. It could look something like this:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[0-3].*) echo "ERROR: Needs bash 4.0 or later" >&2; exit 1;; esac

Dir=$1

files=( "$Dir"/*_* )            # collect files w/ underscores in our directory in an array
declare -A example_per_tag=( )  # create a map from tag to full filename

for file in "${files[@]}"; do   # iterate over the array of files
  basename=${file##*/}          # take off the directory name to get the basename
  tag=${basename%%_*}           # take off the first _ and everything after to get the tag
  example_per_tag[$tag]=$file   # store a link from that tag to the file in our map
done

# can't slice a list of keys from an associative array, so we need an indexed array w/ them
tags=( "${!example_per_tag[@]}" ) # collect only the keys -- the tags -- in an array

# now, iterate over only the first 10 tags
for tag in "${tags[@]:0:10}"; do
  echo "For tag $tag, our example file is ${example_per_tag[$tag]}"
done

Note all the quotes here; the only places where we aren't quoting is either:

  • On the right-hand side of an assignment, or the index of an array lookup (both being cases where string-splitting and globbing are implicitly disabled).
  • For a glob expression (like *_*) where we want it expanded rather than treated as a literal.

Upvotes: 2

Related Questions