Geparada
Geparada

Reputation: 3038

How can I run a command over batches of files?

I have a directory where the files of interest match with this expression:

ls ../a2i_2_{1..96}M.sorted.DEXseq.txt
../a2i_2_10M.sorted.DEXseq.txt  ../a2i_2_25M.sorted.DEXseq.txt  ../a2i_2_3M.sorted.DEXseq.txt   ../a2i_2_54M.sorted.DEXseq.txt  ../a2i_2_69M.sorted.DEXseq.txt  ../a2i_2_83M.sorted.DEXseq.txt
../a2i_2_11M.sorted.DEXseq.txt  ../a2i_2_26M.sorted.DEXseq.txt  ../a2i_2_40M.sorted.DEXseq.txt  ../a2i_2_55M.sorted.DEXseq.txt  ../a2i_2_6M.sorted.DEXseq.txt   ../a2i_2_84M.sorted.DEXseq.txt
../a2i_2_12M.sorted.DEXseq.txt  ../a2i_2_27M.sorted.DEXseq.txt  ../a2i_2_41M.sorted.DEXseq.txt  ../a2i_2_56M.sorted.DEXseq.txt  ../a2i_2_70M.sorted.DEXseq.txt  ../a2i_2_85M.sorted.DEXseq.txt
../a2i_2_13M.sorted.DEXseq.txt  ../a2i_2_28M.sorted.DEXseq.txt  ../a2i_2_42M.sorted.DEXseq.txt  ../a2i_2_57M.sorted.DEXseq.txt  ../a2i_2_71M.sorted.DEXseq.txt  ../a2i_2_86M.sorted.DEXseq.txt
../a2i_2_14M.sorted.DEXseq.txt  ../a2i_2_29M.sorted.DEXseq.txt  ../a2i_2_43M.sorted.DEXseq.txt  ../a2i_2_58M.sorted.DEXseq.txt  ../a2i_2_72M.sorted.DEXseq.txt  ../a2i_2_87M.sorted.DEXseq.txt
../a2i_2_15M.sorted.DEXseq.txt  ../a2i_2_2M.sorted.DEXseq.txt   ../a2i_2_44M.sorted.DEXseq.txt  ../a2i_2_59M.sorted.DEXseq.txt  ../a2i_2_73M.sorted.DEXseq.txt  ../a2i_2_88M.sorted.DEXseq.txt
../a2i_2_16M.sorted.DEXseq.txt  ../a2i_2_30M.sorted.DEXseq.txt  ../a2i_2_45M.sorted.DEXseq.txt  ../a2i_2_5M.sorted.DEXseq.txt   ../a2i_2_74M.sorted.DEXseq.txt  ../a2i_2_89M.sorted.DEXseq.txt
../a2i_2_17M.sorted.DEXseq.txt  ../a2i_2_31M.sorted.DEXseq.txt  ../a2i_2_46M.sorted.DEXseq.txt  ../a2i_2_60M.sorted.DEXseq.txt  ../a2i_2_75M.sorted.DEXseq.txt  ../a2i_2_8M.sorted.DEXseq.txt
../a2i_2_18M.sorted.DEXseq.txt  ../a2i_2_32M.sorted.DEXseq.txt  ../a2i_2_47M.sorted.DEXseq.txt  ../a2i_2_61M.sorted.DEXseq.txt  ../a2i_2_76M.sorted.DEXseq.txt  ../a2i_2_90M.sorted.DEXseq.txt
../a2i_2_19M.sorted.DEXseq.txt  ../a2i_2_33M.sorted.DEXseq.txt  ../a2i_2_48M.sorted.DEXseq.txt  ../a2i_2_62M.sorted.DEXseq.txt  ../a2i_2_77M.sorted.DEXseq.txt  ../a2i_2_91M.sorted.DEXseq.txt
../a2i_2_1M.sorted.DEXseq.txt   ../a2i_2_34M.sorted.DEXseq.txt  ../a2i_2_49M.sorted.DEXseq.txt  ../a2i_2_63M.sorted.DEXseq.txt  ../a2i_2_78M.sorted.DEXseq.txt  ../a2i_2_92M.sorted.DEXseq.txt
../a2i_2_20M.sorted.DEXseq.txt  ../a2i_2_35M.sorted.DEXseq.txt  ../a2i_2_4M.sorted.DEXseq.txt   ../a2i_2_64M.sorted.DEXseq.txt  ../a2i_2_79M.sorted.DEXseq.txt  ../a2i_2_93M.sorted.DEXseq.txt
../a2i_2_21M.sorted.DEXseq.txt  ../a2i_2_36M.sorted.DEXseq.txt  ../a2i_2_50M.sorted.DEXseq.txt  ../a2i_2_65M.sorted.DEXseq.txt  ../a2i_2_7M.sorted.DEXseq.txt   ../a2i_2_94M.sorted.DEXseq.txt
../a2i_2_22M.sorted.DEXseq.txt  ../a2i_2_37M.sorted.DEXseq.txt  ../a2i_2_51M.sorted.DEXseq.txt  ../a2i_2_66M.sorted.DEXseq.txt  ../a2i_2_80M.sorted.DEXseq.txt  ../a2i_2_95M.sorted.DEXseq.txt
../a2i_2_23M.sorted.DEXseq.txt  ../a2i_2_38M.sorted.DEXseq.txt  ../a2i_2_52M.sorted.DEXseq.txt  ../a2i_2_67M.sorted.DEXseq.txt  ../a2i_2_81M.sorted.DEXseq.txt  ../a2i_2_96M.sorted.DEXseq.txt
../a2i_2_24M.sorted.DEXseq.txt  ../a2i_2_39M.sorted.DEXseq.txt  ../a2i_2_53M.sorted.DEXseq.txt  ../a2i_2_68M.sorted.DEXseq.txt  ../a2i_2_82M.sorted.DEXseq.txt  ../a2i_2_9M.sorted.DEXseq.txt

I want to run a command over 12 batches of 8 files each. Thus I made this script:

#!/bin/bash

prefix="a2i_2_"
sufix="M.sorted.DEXseq.txt"

    for i in {0..7}
         do
            a=$(($i*12+1))
            b=$(($i*12+12))

            ls ../$prefix{$a..$b}$sufix


         done

Unfortunately, this is not working because {$a..$b} is getting interpreted as an string and not as a sequence. So I get these errors...

ls: cannot access ../a2i_2_{1..12}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{13..24}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{25..36}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{37..48}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{49..60}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{61..72}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{73..84}M.sorted.DEXseq.txt: No such file or directory
ls: cannot access ../a2i_2_{85..96}M.sorted.DEXseq.txt: No such file or directory

How can I express {$a..$b} in a way that can be interpreted as sequences? Or at least if you know an alternative way to separate these files in batches will also work for me.

As an alternative, I can move the files (or make symbolic links) into different folders and run the command that I want

#!/bin/bash

prefix="a2i_2_"
sufix="M.sorted.DEXseq.txt"

for i in {0..7}
     do
        a=$(($i*12+1))
        b=$(($i*12+12))

        for n in $(seq $a $b)

            do

                ln -s ../$prefix$n$sufix .


            done

        # And now I can run the command for each batch ...

        awk '{sums[$1] += $2;} END { for (i in sums) print i " " sums[i]; }' *$sufix | sort -k1  > $prefix$a-$b$sufix.sum
        rm *$sufix

     done

... But I would like to learn a more direct way to do it.

Thanks for your time.

Upvotes: 4

Views: 80

Answers (1)

karakfa
karakfa

Reputation: 67567

you need to evaluate the expression you constructed, for example

$ a=3;b=7; echo {$a..$b};
{3..7}

$ a=3;b=7; eval echo {$a..$b}
3 4 5 6 7

however, this is not the right approach. You can get the values with seq

for example

$ a=3;b=7; seq $a $b
3
4
5
6
7

Since you're using awk why not use it at full power, you can eliminate most of the problems

for example, a template solution

$ awk 'c==0{start=FILENAME} 
     FNR==1{c++} 
           {sum+=$2} 
       c==8{print start"-"FILENAME; c=0; print sum; delete sum}' file{1..100}

it prints the sum of for a batch of 8, the file order is by brace extension. The only downside is it will skip the empty files.

Upvotes: 4

Related Questions