Substituing specific nucleotides in FastaQ files in Linux

Question

I have some fastaq files that I need to analyse. The main issue is that the analysis tool I'm currently working with only accept ACTG as nucleotides and not the rest of nomenclatures in the IUPAC code (R, W, etc).

I've made this code to change the specific nucleotides:

awk '{
    split($2,a,"") ; 
    str="" ; 
    for (n in a) {nucleotide=a[n]} ; 
    if (nucleotide~/[ACTG]/) {str=str""nucleotide} 
    else {
        if (nucleotide~/[RWMV]/) {str=str""A} 
        else {
            if (nucleotide~/[YD]/) {str=str""C} 
            else {
                if (nucleotide~/[SKN]/) {str=str""G} 
                else {str=str""T}
            }
        }
    }
}' | head

It is working but it is super slow. Do you know a more efficient way to do it?

Thank you so much!

Substituing specific nucleotides in FastaQ files in Linux

Answers (1)

Related Questions