Reputation: 606
I have an array, let's call it ensembldb
that has the following lines:
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
and another ordered array, let's call it ordered_array
:
frameshift_variant
missense_variant
inframe_insertion
inframe_deletion
I would like to order my array ensembldb
to match the orders in array ordered_array
. The output expected is the following:
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
I checked this question but it doesn't answer my question as it is a multidimensional array. How can I order my array ensembldb
according to the ordered array ordered_array
?
Thank you.
Edit 1: Adding code as requested by @anubhava
declare -A ordered_array
ordered_array[0]="frameshift_variant"
ordered_array[1]="missense_variant"
ordered_array[2]="inframe_insertion"
ordered_array[3]="inframe_deletion"
while read -r LINE; do
chrom=$(echo -e "$LINE" | cut -f1 -d$'\t' | sed 's/^chr//g')
pos=$(echo -e "$LINE" | cut -f2 -d$'\t')
ref=$(echo -e "$LINE" | cut -f3 -d$'\t')
alt=$(echo -e "$LINE" | cut -f4 -d$'\t')
LINE=$(echo -e "$LINE" | sed 's/^chr//g')
ensembldb=$(echo "PREPARE stmt1 FROM 'SELECT Annotated_ID, Transcript, Gene_ID, Gene_name, Consequence, Swissprot_ID, AA_change, Biotype, Gene_description, RefSeq_mRNA, RefSeq_peptide FROM SNP_annot.37_annot_ensembl_89_full_descr where chrom = \"$chrom\" and Start = \"$pos\" and Local_alleles = \"$ref/$alt\"'; execute stmt1;" | mariadb -A -N)
readarray -t array <<< "$ensembldb"
pos19=$(echo "PREPARE stmt2 FROM 'select hg19_pos from SNP_annot.mut_convert_pos where chrom = \"$chrom\" and hg38_pos = \"$pos\"'; execute stmt2;" | mariadb -A -N)
hits=$(echo -e "$ensembldb" | wc -l)
[ ! -z "$pos19" ] && awk -v line="$LINE" -v pos="$pos19" -v ensembl="$ensembldb" -v hit="$hits" 'BEGIN {print line"\t"ensembl"\t"hit"\t"pos}'
done
1.The variable LINE
has rows like this:
CHROM POS REF ALT QUAL DP Genotype
chr1 16495 G C 1722.77 252 G/C
chr1 16719 T A 145.77 189 T/A
chr1 16841 G T 701.77 521 G/T
chr1 17626 G A 154.77 124 G/A
2.The variable ensembldb
is a MySQL query that returns multiple rows and converted to an array. It contains rows that I want to sort according to ordered_array
and pick the first row that matches ordered_array
.
Upvotes: 1
Views: 92
Reputation: 1809
This awk
might work for you:
awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' file_a file_b
If a
and b
are really arrays:
readarray -t a < <(awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' <(printf '%s\n' "${a[@]}") <(printf '%s\n' "${b[@]}"))
Upvotes: 2