pierogi
pierogi

Reputation: 25

how to produce multiple readlength.tsv at once from multiple fastq files?

ı have 16 fastq files under the different directories to produce readlength.tsv seperately and ı have some script to produce readlength.tsv .this is the script that ı should use to produce readlength.tsv

zcat ~/proje/project/name/fıle_fastq | paste - - - - | cut -f1,2 | while read readID sequ;
do
    len=`echo $sequ | wc -m`
    echo -e "$readID\t$len"
done > ~/project/name/fıle1_readlength.tsv

one by one ı can produce this readlength but it will take long time .I want to produce readlength at once thats why I created list that involved these fastq fıles but ı couldnt produce any loop to produce readlength.tsv at once from 16 fastq files.

ı would appreaciate ıf you can help me

Upvotes: 1

Views: 98

Answers (1)

tshiono
tshiono

Reputation: 22012

Assuming a file list.txt contains the 16 file paths such as:

~/proje/project/name/file1_fastq
~/proje/project/name/file2_fastq
..
~/path/to/the/fastq_file16

Then would you please try:

#!/bin/bash

while IFS= read -r f; do                # "f" is assigned to each fastq filename in "list.txt"
    mapfile -t ary < <(zcat "$f")       # assign "ary" to the array of lines
    echo -e "${ary[0]}\t${#ary[1]}"     # ${ary[0]} is the id and ${#ary[1]} is the length of sequence
done < list.txt > readlength.tsv

As the fastq file format contains the id in the 1st line and the sequence in the 2nd line, bash built-in mapfile will be better to handle them.

As a side note, the letter ı in your code looks like a non-ascii character.

Upvotes: 1

Related Questions