Reputation: 171
I need to use AWK to sort through 1000 folders and extract the 2nd row of the 5th column in a file. We'll call it file.frq. For example:
home/user/directory/data1/file.frq
...
home/user/directory/data1000/file.frq
file.frq looks like this:
CHR SNP A1 A2 MAF NCHROBS
3 fa0 A G 0.22 300
I need the output of the AWK script to just list that 1-MAF value (1-0.22 in this case, so 0.78) 1000 times for each .frq
file in each data
directory. I was playing around with find
, but it is new to me and I'm not sure it's the right tool.
Upvotes: 0
Views: 1340
Reputation: 2747
To get only the values:
find /home/user/directory/ -name file.frq -exec awk 'FNR == 2 { print 1-$5 }' {} \;
To also get the filename in the output:
find /home/user/directory/ -name file.frq -exec awk 'FNR == 2 { print FILENAME " " 1-$5 }' {} \;
Edit
To sort the output in the desired order you could for example pipe the results through:
sed s/data// | sort -n | sed s/^/data/
or shorter:
sort -ta -k3n
Upvotes: 2
Reputation: 804
awk 'FNR == 2 {print FILENAME, 1 - $5}' data*/file.frq | sort -V
If its the second record, print the file name and 1 - the fifth column. A version sort seems to get proper ordering.
Upvotes: 4
Reputation: 75588
With Ruby:
ruby -e 'def get_i(f); f.gsub(/^.*\/data/, "").gsub(/\/file.frq$/, "").to_i; end;
Dir.glob("/home/user/directory/data*/file.frq").sort{|a,b| get_i(a) <=> get_i(b)}.each{|f|
File.readlines(f).each{|l| v = (Float(l.split[4]) rescue nil) and puts "#{f} #{(1-v).to_s}"}}'
I had this output on a test version:
/tmp/data1/file.frq 0.78
/tmp/data20/file.frq 0.78
/tmp/data1000/file.frq 0.78
Upvotes: 0