Michael
Michael

Reputation: 11

using awk in bash script with variables to parse

I am trying to do queries from a big file. I am using "awk" in a bash script. The bash script reads some parameters (line by line) from a parameter file and put them in variables which are then passed to the awk. The result of each query needs to be stored in a separate file named as specified in the parameter file:

#!/bin/bash

while IFS=\t read chr start end name
do 

echo $chr $start $end $name

awk -v "chr=$chr" -v "start=$start" -v "end=$end" '$1==chr && $3>start && $3<end && $11<5E-2 {print $0}' bigfile.out > ${name}.out

done < parameterfile

Unfortunately, the awk command does not produce any output. Any idea what might be wrong. (based on echo command bash variables are assigned correctly).

Upvotes: 1

Views: 3070

Answers (3)

GreenFox
GreenFox

Reputation: 1122

I do not know what the exact specific requirement is for having bash in between, however if reading input from a file / user is a requirement, then this should work

#!/bin/bash  
cat parameterfile |awk 'BEGIN{  
    FS="\t";  
}{  
 # If parameterfile has multiple lines, and you want to comment in them, prahaps  
 #  if($0~"^[ \t]*#")next;  
 # Will allow lines starting with # (with any amount of space or tab in the front) to be reconized  
 # as comments instead of parameters :-)  
 #  
 # read the parameter file, whatever format it may be.  
 # Here we assume parameterfile is tab separated, so inside the BEGIN{} we specify FS as tab  
 # if it is a cvs , then A[0]=split($0,A,","); and then chr=A[1]; as such.  
 chr=$1;  
 start=$2;  
 end=$3;  
 name=$4;  
 # Lets start reading the file. We could read this from parameter file, if you want, or a -v var=arg on awk  
 file_to_read_from="bigfile.out";  
 while((getline line_of_data < file_to_read_from)>0){  
    # Since I do not have psychic powers to guess the format of the input of the file, here is some example  
    # If it is separated my more than one space   
    # B[0]=split(line_of_data,B,"[ ]");  
    # If it is separated by tabs  
    B[0]=split(line_of_data,B,"\t");  

    # Check if the line matches our specified whatever  condition
    if( B[1]==chr && B[3]>start && B[3]<end && B[11]<5E-2 ){  
      # Print to whatever destination  
      print > name".out";  
    }  

 }  
 # Done reading all lines from file_to_read_from
 # Close opened file, so that we can handle millions of files  
 close(file_to_read_from);  
 # If parameterfile has multiple lines, then more is processed.
 # If you only want the first line of parameter file to be read, then
 # exit 0;
 # should get you out of here
}'   

Upvotes: 0

Hai Vu
Hai Vu

Reputation: 40688

The key is at the IFS:

while IFS='   ' read chr start end name

where what is between the single quotes is a tab char.

Upvotes: 1

clt60
clt60

Reputation: 63892

IMHO Bash does not understand "\t" in IFS. Try this

while IFS=$(echo -e "\t") read chr start end name
do
        echo =$chr=$start=$end=$name=
done <<EOF
11      1       10      aaa bbb
12      3       30      ccc bbb
EOF

This one will break up tab delimited text. Your variant will assign everything into $chr. Every time print variable assignments with visible delimiters. :) '=' for example.

Upvotes: 1

Related Questions