Carmen Sandoval
Carmen Sandoval

Reputation: 2356

While read line, awk $line and write to variable

I am trying to split a file into different smaller files depending on the value of the fifth field. A very nice way to do this was already suggested and also here.

However, I am trying to incorporate this into a .sh script for qsub, without much success.

The problem is that in the section where the file to which output the line is specified,

i.e., f = "Alignments_" $5 ".sam" print > f

, I need to pass a variable declared earlier in the script, which specifies the directory where the file should be written. I need to do this with a variable which is built for each task when I send out the array job for multiple files.

So say $output_path = ./Sample1

I need to write something like

f = $output_path "/Alignments_" $5 ".sam"        print > f

But it does not seem to like having a $variable that is not a $field belonging to awk. I don't even think it likes having two "strings" before and after the $5.

The error I get back is that it takes the first line of the file to be split (little.sam) and tries to name f like that, followed by /Alignments_" $5 ".sam" (those last three put together correctly). It says, naturally, that it is too big a name.

How can I write this so it works?

Thanks!

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    f = "Alignments_" $5 ".sam"        print > f
} ' Tile_Number_List.txt little.sam

UPDATE, AFTER ADDING -V TO AWK AND DECLARING THE VARIABLE OPATH

input=$1
outputBase=${input%.bam}

mkdir -v $outputBase\_TEST

newdir=$outputBase\_TEST

samtools view -h $input | awk 'NR >= 18' | awk -F '[\t:]' -v opath="$newdir" '

FNR == NR {
    num[$1]
    next
}

$5 in num {
    f = newdir"/Alignments_"$5".sam";
    print > f
} ' Tile_Number_List.txt -

mkdir: created directory little_TEST'
awk: cmd. line:10: (FILENAME=- FNR=1) fatal: can't redirect to `/Alignments_1101.sam' (Permission denied)

Upvotes: 0

Views: 1892

Answers (2)

Chris Seymour
Chris Seymour

Reputation: 85815

To pass the value of the shell variable such as $output_path to awk you need to use the -v option.

$ output_path=./Sample1/

$ awk -F '[:\t]' -v opath="$ouput_path" '    
    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
    $5 in num {
        f = opath"Alignments_"$5".sam"
        print > f
    } ' Tile_Number_List.txt little.sam

Also you still have the error from your previous question left in your script

EDIT:

The awk variable created with -v is obase but you use newdir what you want is:

input=$1
outputBase=${input%.bam}
mkdir -v $outputBase\_TEST
newdir=$outputBase\_TEST

samtools view -h "$input" | awk -F '[\t:]' -v opath="$newdir" '
FNR == NR && NR >= 18 {
    num[$1]
    next
}    
$5 in num {
    f = opath"/Alignments_"$5".sam"   # <-- opath is the awk variable not newdir
    print > f
}' Tile_Number_List.txt -

You should also move NR >= 18 into the second awk script.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203684

awk variables are like C variables - just reference them by name to get their value, no need to stick a "$" in front of them like you do with shell variables:

awk -F '[:\t]' '    # read the list of numbers in Tile_Number_List
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the .BAM file
    # any lines with an "unknown" $5 will be ignored
$5 in num {
    output_path = "./Sample1/"
    f = output_path "Alignments_" $5 ".sam"
    print > f
} ' Tile_Number_List.txt little.sam

Upvotes: 1

Related Questions