user3458909
user3458909

Reputation: 43

Creating an array with awk and passing it to a second awk operation

I have a column file and I want to print all the lines that do not contain the string SOL, and to print only the lines that do contain SOL but has the 5th column <1.2 or >4.8.

The file is structured as: MOLECULENAME ATOMNAME X Y Z

Example:

  151SOL     OW 6554   5.160   2.323   4.956  
  151SOL    HW1 6555   5.188   2.254   4.690  ----> as you can see this atom is out of the   
  151SOL    HW2 6556   5.115   2.279   5.034  threshold, but it need to be printed

What I thought is to save a vector with all the MOLECULENAME that I want, and then tell awk to match all the MOLECULENAME saved in vector "a" with the file, and print the complete output. ( if I only do the first awk i end up having bad atom linkage near the thershold)

The problem is that i have to pass the vector from the first awk to the second... I tried like this with a[], but of course it doesn't work.

How can i do this ?

Here is the code I have so far:

a[] = (awk 'BEGIN{i=0} $1 !~ /SOL/{a[i]=$1;i++}; /SOL/ && $5 > 4.8 {a[i]=$1;i++};/SOL/ &&$5<1.2 {a[i]=$1;i++}')

awk -v a="$a[$i]" 'BEGIN{i=0} $1 ~ $a[i] {if (NR>6540) {for (j=0;j<3;j++) {print $0}} else {print $0} 

Upvotes: 0

Views: 117

Answers (3)

user3458909
user3458909

Reputation: 43

SOLVED! Thanks to all, here is how i solved it.

    #!/bin/bash
    file=$1
    awk 'BEGIN {molecola="";i=0;j=1;}  
    {if ($1 !~ /SOL/) {print $0}    
    else if ( $1 != molecola && $1 ~ /SOL/ ) {   
    for (j in arr_comp) {if( arr_comp[j] < 1.2 || arr_comp[j] > 5) {for(j in arr_comp)                                 {print   arr_mol[j] };break}}  
    delete(arr_comp)  
    delete(arr_mol)  
    arr_mol[0]=$0   
    arr_comp[0]=$5  
    molecola=$1  
    j=1     
    }   
    else {arr_mol[j]=$0;arr_comp[j]=$5;j++} }' $file  

Upvotes: 0

vicsana1
vicsana1

Reputation: 369

You can put all of the same molecule names in one row by using sort on the file and then running this AWK which basically uses printf to print on the same line until a different molecule name is found. Then, a new line starts. The second AWK script is used to detect which molecules names have 3 valid lines in the original file. I hope this can help you to solve your problem

sort your_file | awk 'BEGIN{ molname=""; } ( $0 !~ "SOL" || ( $0 ~ "SOL" && ( $5<1.2 || $5>4.8 ) )  ){ if($1!=molname){printf("\n");molname=$1}for(i=1;i<=NF;i++){printf("%s ",$i);}}' | awk 'NF>12 {print $0}'

Upvotes: 1

Josh Jolly
Josh Jolly

Reputation: 11786

awk '!/SOL/ || $5 < 1.2 || $5 > 4.8' inputfile.txt

Print (default behaviour) lines where:

  • "SOL" is not found
  • SOL is found and fifth column < 1.2
  • SOL is found and fifth column > 4.8

Upvotes: 0

Related Questions