Reputation: 822
Programming beginner here needs some help modifying an AWK script to make it conditional. Alternative non-awk solutions are also very welcome.
NOTE Main filtering is now working thanks to help from Birei but I have an additional problem, see note below in question for details.
I have a series of input files with 3 columns like so:
chr4 190499999 190999999
chr6 61999999 62499999
chr1 145499999 145999999
I want to use these rows to filter another file (refGene.txt) and if a row in file one mathces a row in refGene.txt, to output column 13 in refGene.txt to a new file 'ListofGenes_$f'. The tricky part for me is that I want it to count as a match as long as column one (eg 'chr4', 'chr6', 'chr1' ) and column 2 AND/OR column 3 matches the equivalent columns in the refGene.txt file. The equivalent columns between the two files are $1=$3, $2=$5, $3=$6. Then I am not sure in awk how to not print the whole row from refGene.txt but only column 13.
NOTE I have achieved the conditional filtering described above thanks to help from Birei. Now I need to incorporate an additional filter condition. I also need to output column $13 from the refGene.txt file if any of the region between value $2 and $3 overlaps with the region between $5 and $6 in the refGene.txt file. This seems a lot trickier as it involves mathmatical computation to see if the regions overlap.
My script so far:
FILES=/files/*txt
for f in $FILES ;
do
awk '
BEGIN {
FS = "\t";
}
FILENAME == ARGV[1] {
pair[ $1, $2, $3 ] = 1;
next;
}
{
if ( pair[ $3, $5, $6 ] == 1 ) {
print $13;
}
}
' $(basename $f) /files/refGene.txt > /files/results/$(basename $f) ;
done
Any help is really appreciated. Thanks so much!
Rubal
Upvotes: 0
Views: 1194
Reputation: 36262
One way.
awk '
BEGIN { FS = "\t"; }
## Save third, fifth and seventh field of first file in arguments (refGene.txt) as the key
## to compare later. As value the field to print.
FNR == NR {
pair[ $3, $5, $6 ] = $13;
next;
}
## Set the name of the output file.
FNR == 1 {
output_file = "";
split( ARGV[ARGIND], path, /\// );
for ( i = 1; i < length( path ); i++ ) {
current_file = ( output_file ? "/" : "" ) path[i];
}
output_file = output_file "/ListOfGenes_" path[i];
}
## If $1 = $3, $2 = $5 and $3 = $6, print $13 to output file.
{
if ( pair[ $1, $2, $3 ] ) {
print pair[ $1, $2, $3 ] >output_file;
}
}
' refGene.txt /files/rubal/*.txt
Upvotes: 1