Reputation: 11
I am trying to do queries from a big file. I am using "awk" in a bash script. The bash script reads some parameters (line by line) from a parameter file and put them in variables which are then passed to the awk. The result of each query needs to be stored in a separate file named as specified in the parameter file:
#!/bin/bash
while IFS=\t read chr start end name
do
echo $chr $start $end $name
awk -v "chr=$chr" -v "start=$start" -v "end=$end" '$1==chr && $3>start && $3<end && $11<5E-2 {print $0}' bigfile.out > ${name}.out
done < parameterfile
Unfortunately, the awk command does not produce any output. Any idea what might be wrong. (based on echo command bash variables are assigned correctly).
Upvotes: 1
Views: 3070
Reputation: 1122
I do not know what the exact specific requirement is for having bash in between, however if reading input from a file / user is a requirement, then this should work
#!/bin/bash
cat parameterfile |awk 'BEGIN{
FS="\t";
}{
# If parameterfile has multiple lines, and you want to comment in them, prahaps
# if($0~"^[ \t]*#")next;
# Will allow lines starting with # (with any amount of space or tab in the front) to be reconized
# as comments instead of parameters :-)
#
# read the parameter file, whatever format it may be.
# Here we assume parameterfile is tab separated, so inside the BEGIN{} we specify FS as tab
# if it is a cvs , then A[0]=split($0,A,","); and then chr=A[1]; as such.
chr=$1;
start=$2;
end=$3;
name=$4;
# Lets start reading the file. We could read this from parameter file, if you want, or a -v var=arg on awk
file_to_read_from="bigfile.out";
while((getline line_of_data < file_to_read_from)>0){
# Since I do not have psychic powers to guess the format of the input of the file, here is some example
# If it is separated my more than one space
# B[0]=split(line_of_data,B,"[ ]");
# If it is separated by tabs
B[0]=split(line_of_data,B,"\t");
# Check if the line matches our specified whatever condition
if( B[1]==chr && B[3]>start && B[3]<end && B[11]<5E-2 ){
# Print to whatever destination
print > name".out";
}
}
# Done reading all lines from file_to_read_from
# Close opened file, so that we can handle millions of files
close(file_to_read_from);
# If parameterfile has multiple lines, then more is processed.
# If you only want the first line of parameter file to be read, then
# exit 0;
# should get you out of here
}'
Upvotes: 0
Reputation: 40688
The key is at the IFS:
while IFS=' ' read chr start end name
where what is between the single quotes is a tab char.
Upvotes: 1
Reputation: 63892
IMHO Bash does not understand "\t" in IFS. Try this
while IFS=$(echo -e "\t") read chr start end name
do
echo =$chr=$start=$end=$name=
done <<EOF
11 1 10 aaa bbb
12 3 30 ccc bbb
EOF
This one will break up tab delimited text. Your variant will assign everything into $chr
. Every time print variable assignments with visible delimiters. :) '=' for example.
Upvotes: 1