Reputation: 321
I have data like this:
AA_MAF EA_MAF ExAC_MAF
- - -
G:0.001445 G:0.0044 -
- - -
- - C:0.277
C:0.1984 C:0.1874 C:0.176
G:0.9296 G:0.9994 G:0.993&C:8.237e-06
C:0.9287 C:0.9994 C:0.993&T:5.767e-05
I need to split all column by :
and &
- this mean separate all letters (A,C,G,T) from their frequencies (numbers followed by letter). This is very complicated and I not sure if it is possible to solve.
require output is tab separate:
AA_MAF AA_MAF EA_MAF EA_MAF ExAC_MAF ExAC_MAF ExAC_MAF ExAC_MAF
- - - - - -
G 0.001445 G 0.0044 - - - -
- - - - - -
- - C 0.277 - -
C 0.1984 C 0.1874 C 0.176 - -
G 0.9296 G 0.9994 G 0.993 C 8.24E-006
C 0.9287 C 0.9994 C 0.993 T 5.77E-005
If array is empty try to substitute -
.
My try was:
awk -v OFS="\t" '{{for(i=1; i<=NF; i++) sub(":","\t",$i)}; sub ("&","\t",$i) 1'}' IN_FILE | awk 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = "-" }1'
Upvotes: 0
Views: 195
Reputation: 10039
awk '{for (i=1;i<=NF;i++) {
v1 = v2 = $i
if ($i ~ /:/ ) { gsub(/:.*/, "", v1); gsub( /.*:/, "", v2)}
printf( "%s%s%s%s", v1, OFS, v2, OFS)
}
print ""
}' YourFile
Check for each field content if ":" inside, if the case, separate the content, if not duplicate then print both the value with a separator between until end of the fields. Do it for each lines (including header)
Upvotes: 1
Reputation: 33387
If the trailing slashes are not required, you could use this command:
$ awk -F'[ \t:&]+' -v OFS='\t' '{$1=$1}1' file
AA_MAF EA_MAF ExAC_MAF
- - -
G 0.001445 G 0.0044 -
- - -
- - C 0.277
C 0.1984 C 0.1874 C 0.176
G 0.9296 G 0.9994 G 0.993 C 8.237e-06
C 0.9287 C 0.9994 C 0.993 T 5.767e-05
If you need the trailing slashes:
$ awk -F'[ \t:&]+' -v OFS='\t' '{$1=$1;for(i=NF+1;i<=8;i++)$i="-"}1' file
AA_MAF EA_MAF ExAC_MAF - - - - -
- - - - - - - -
G 0.001445 G 0.0044 - - - -
- - - - - - - -
- - C 0.277 - - - -
C 0.1984 C 0.1874 C 0.176 - -
G 0.9296 G 0.9994 G 0.993 C 8.237e-06
C 0.9287 C 0.9994 C 0.993 T 5.767e-05
Upvotes: 1