Paul
Paul

Reputation: 321

Split rows and skip empty places in awk

I have data like this:

AA_MAF  EA_MAF  ExAC_MAF
-   -   -
G:0.001445  G:0.0044    -
-   -   -
-   -   C:0.277
C:0.1984    C:0.1874    C:0.176
G:0.9296    G:0.9994    G:0.993&C:8.237e-06
C:0.9287    C:0.9994    C:0.993&T:5.767e-05

I need to split all column by : and & - this mean separate all letters (A,C,G,T) from their frequencies (numbers followed by letter). This is very complicated and I not sure if it is possible to solve.

require output is tab separate:

AA_MAF  AA_MAF  EA_MAF  EA_MAF  ExAC_MAF    ExAC_MAF    ExAC_MAF    ExAC_MAF
-       -       -   -   -   -
G   0.001445    G   0.0044  -   -   -   -
-       -       -   -   -   -
-       -       C   0.277   -   -
C   0.1984  C   0.1874  C   0.176   -   -
G   0.9296  G   0.9994  G   0.993   C   8.24E-006
C   0.9287  C   0.9994  C   0.993   T   5.77E-005

If array is empty try to substitute - .

My try was:

awk -v OFS="\t" '{{for(i=1; i<=NF; i++) sub(":","\t",$i)}; sub ("&","\t",$i) 1'}' IN_FILE |  awk 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = "-" }1'

Upvotes: 0

Views: 195

Answers (2)

NeronLeVelu
NeronLeVelu

Reputation: 10039

awk '{for (i=1;i<=NF;i++) {
        v1 = v2 = $i
        if ($i ~ /:/ ) { gsub(/:.*/, "", v1); gsub( /.*:/, "", v2)}
        printf( "%s%s%s%s", v1, OFS, v2, OFS)
        }
      print ""
      }' YourFile

Check for each field content if ":" inside, if the case, separate the content, if not duplicate then print both the value with a separator between until end of the fields. Do it for each lines (including header)

Upvotes: 1

user000001
user000001

Reputation: 33387

If the trailing slashes are not required, you could use this command:

$ awk -F'[ \t:&]+' -v OFS='\t' '{$1=$1}1' file
AA_MAF  EA_MAF  ExAC_MAF
-   -   -
G   0.001445    G   0.0044  -
-   -   -
-   -   C   0.277
C   0.1984  C   0.1874  C   0.176
G   0.9296  G   0.9994  G   0.993   C   8.237e-06
C   0.9287  C   0.9994  C   0.993   T   5.767e-05

If you need the trailing slashes:

$ awk -F'[ \t:&]+' -v OFS='\t' '{$1=$1;for(i=NF+1;i<=8;i++)$i="-"}1' file
AA_MAF  EA_MAF  ExAC_MAF    -   -   -   -   -
-   -   -   -   -   -   -   -
G   0.001445    G   0.0044  -   -   -   -
-   -   -   -   -   -   -   -
-   -   C   0.277   -   -   -   -
C   0.1984  C   0.1874  C   0.176   -   -
G   0.9296  G   0.9994  G   0.993   C   8.237e-06
C   0.9287  C   0.9994  C   0.993   T   5.767e-05

Upvotes: 1

Related Questions