Reputation: 3022
I am trying to use awk
to split the file
, skipping the header, into either an 8-column
or 6-column
output. I am not sure if I did the split correct though as I need to split $2
first by the :
then by the -
. The desired output of each awk
is below as one or the other is used depending on the situation. Thank you :).
file 'tab-delimited`
Gene Position Strand
SMARCB1 22:24133967-24133967 +
RB1 13:49037865-49037865 -
SMARCB1 22:24176357-24176357 +
awk
awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,"0",$3,"GENE_ID="$1}'
8-column desired output tab-delimited
chr22 24133967 24133967 chr22:24133967-24133967 0 + . GENE_ID=SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 0 - . GENE_ID=RB1
chr22 24176357 24176357 chr22:24176357-24176357 0 + . GENE_ID=SMARCB1
awk
awk -F'\t' -v OFS="\t" 'NR>1{split($2,a,":"); print a[1],a[2],a[3],"chr"$2,".",$1,}'
6-column desired output tab-delimited
chr22 24133967 24133967 chr22:24133967-24133967 . SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 . RB1
chr22 24176357 24176357 chr22:24176357-24176357 . SMARCB1
Upvotes: 0
Views: 437
Reputation: 92854
Extended approach:
For 6-column output:
awk -v c=6 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr";
printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2);
if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file
The output:
chr22 24133967 24133967 chr22:24133967-24133967 . SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 . RB1
chr22 24176357 24176357 chr22:24176357-24176357 . SMARCB1
For 8-column output (via passing -v c=<number>
(column) variable):
awk -v c=8 'BEGIN{ FS=OFS="\t" }NR>1{ split($2,a,":|-"); k="chr";
printf("%s\t%d\t%d\t%s\t",k a[1],a[2],a[3],k $2);
if (c==6) print ".",$1; else print "0",$3,".","GENE_ID="$1 }' file
The output:
chr22 24133967 24133967 chr22:24133967-24133967 0 + . GENE_ID=SMARCB1
chr13 49037865 49037865 chr13:49037865-49037865 0 - . GENE_ID=RB1
chr22 24176357 24176357 chr22:24176357-24176357 0 + . GENE_ID=SMARCB1
Upvotes: 2