Reputation: 171

AWK for files in multiple folders

I need to use AWK to sort through 1000 folders and extract the 2nd row of the 5th column in a file. We'll call it file.frq. For example:

home/user/directory/data1/file.frq
...
home/user/directory/data1000/file.frq

file.frq looks like this:

 CHR  SNP   A1   A2          MAF  NCHROBS
   3  fa0    A    G         0.22      300

I need the output of the AWK script to just list that 1-MAF value (1-0.22 in this case, so 0.78) 1000 times for each .frq file in each data directory. I was playing around with find, but it is new to me and I'm not sure it's the right tool.

Upvotes: 0

Answers (3)

ebo

Reputation: 2747

To get only the values:

find /home/user/directory/ -name file.frq -exec awk 'FNR == 2 { print 1-$5 }' {} \;

To also get the filename in the output:

find /home/user/directory/ -name file.frq -exec awk 'FNR == 2 { print FILENAME " " 1-$5 }' {} \;

Edit

To sort the output in the desired order you could for example pipe the results through:

sed s/data// | sort -n | sed s/^/data/

or shorter:

sort -ta -k3n

Upvotes: 2

yate

Reputation: 804

awk 'FNR == 2 {print FILENAME, 1 - $5}' data*/file.frq | sort -V

If its the second record, print the file name and 1 - the fifth column. A version sort seems to get proper ordering.

Upvotes: 4

konsolebox

Reputation: 75588

With Ruby:

ruby -e 'def get_i(f); f.gsub(/^.*\/data/, "").gsub(/\/file.frq$/, "").to_i; end;
    Dir.glob("/home/user/directory/data*/file.frq").sort{|a,b| get_i(a) <=> get_i(b)}.each{|f|
        File.readlines(f).each{|l| v = (Float(l.split[4]) rescue nil) and puts "#{f} #{(1-v).to_s}"}}'

I had this output on a test version:

/tmp/data1/file.frq 0.78
/tmp/data20/file.frq 0.78
/tmp/data1000/file.frq 0.78

Upvotes: 0

AWK for files in multiple folders

Answers (3)

Related Questions