Peter
Peter

Reputation: 19

Loop over different directories and change of file name

I have the following subject folder structure:

./sub-CC0006/func
..
./sub-CC0199/func

Within the func folder I have a file called sub-CC0006_ses-core2p2_task-loi3_run-01_events.tsv. When I tried to put the below code in a loop it did not work. (I tried to first loop to each subject directory and then change the .tsv file names based on the different subject number.)

awk -F"\t" -v OFS="\t" '{
       for (i=1;i<=NF;i++) {
         if ($i == "NaN") $i="n/a"
       }
       print $0
 }' sub-CC0006_ses-core2p2_task-loi3_run-01_events.tsv > sub-CC0006_ses-core2p2_task-loi3_run-01_events_new.tsv &&
mv sub-CC0006_ses-core2p2_task-loi3_run-01_events_new.tsv sub-CC0006_ses-core2p2_task-loi3_run-01_events.tsv

Here is an extract from one of the files I am trying to manipulate:

onset response_time
9 NaN
12 1.4

Upvotes: 0

Views: 106

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

The basic technique for overwriting a file with an edited version of the file uses a generic temporary file name as the intermediary file.

I'm assuming that in the sub-CC0199 directory, the func subdirectory will contain sub-CC01999_ses-core2p2_task-loi3_run-01_events.tsv and that any other files in the directory are to be ignored, and similarly for each other directory. The script becomes simpler if you simply want to process all the files (or all the *.tsv files, or some other pattern match) in each of the func subdirectories for each of the subjects.

tmpfile=$(mktemp "map.XXXXXX")
trap "rm -f $tmpfile; exit 1" 0 1 2 3 13 15

suffix="_ses-core2p2_task-loi3_run-01_events.tsv"

for directory in sub-CC0???
do
    file="$directory/func/$directory$suffix"
    if [ -f "$file" ]
    then
        awk '…' "$file" > "$tmpfile" &&
        mv "$tmpfile" "$file"
    fi
done

rm -f "$tmpfile"   # Remove the temporary
trap 0             # Cancel the 'exit' trap; the script exits with status 0    

If you're worried about preserving links (or ownership, or permissions) on the original file, or that the original file might be a symlink you want to preserve, you can use cp "$tmpfile" "$file"; rm -f "$tmpfile" instead of mv. It's slightly slower, though — but unless the files are big, probably not measurably slower.

You could generate the temporary file name within the loop; it might be marginally safer to do so if you're worried about malicious actors. The file is new (did not exist before) when created by mktemp, but after you've moved it, a malicious person could create their own symlink to somewhere sensitive so the script could damage other files unexpectedly. (You could also copy the temporary file over the original without removing the temporary, so the same file is used for each .tsv file — the options are legion.) You're probably not working in an environment that hostile, though.

The trap list is for "EXIT" (0) and signals 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), 13 (SIGPIPE) and 15 (SIGTERM). I learned to script when only the numbers worked — and they're compact. If you want to be slightly more modern, you could list the short names of the signals and conditions:

trap "rm -f $tmpfile; exit 1" EXIT HUP INT QUIT PIPE TERM
…
trap EXIT

or (to cancel multiple traps, though it's unnecessary when the script is about to exit):

trap - EXIT HUP INT QUIT PIPE TERM

Upvotes: 1

Related Questions