Reputation: 315
I'm working with tab delimited file (VCF file enter link description here) with large number of columns (a small example is bellow)
1 13979 S01_13979 C G . . PR GT ./. ./.
1 13980 S01_13980 G A . . PR GT ./. ./.
1 13986 S01_13986 G A . . PR GT ./. ./.
1 14023 S01_14023 G A . . PR GT 0/0 ./.
1 15671 S01_15671 A T . . PR GT 0/0 0/0
1 60519 S01_60519 A G . . PR GT 0/0 0/0
1 60531 S01_60531 T C . . PR GT 0/0 0/0
1 63378 S01_63378 A G . . PR GT 1/1 ./.
1 96934 S01_96934 C T . . PR GT 0/0 0/0
1 96938 S01_96938 C T . . PR GT 0/0 0/0
In the 1-st column (chromosome name) i have numbers from 1 to 26 (e.g. 1,2,...25,26). I'd like to add HanXRQChr0 prefix to the numbers from 1 to 9, and HanXRQChr prefix to the numbers from 10 to 26. The values in all other columns should remain unchanged.
For now i tried a sed
solution, but the output is not completely correct (the last pipe doesn't work):
cat test.vcf | sed -r '/^[1-9]/ s/^[1-9]/HanXRQChr0&/' | sed -r '/^[1-9]/ s/^[0-9]{2}/HanXRQChr&/' > test-1.vcf
How to do that by AWK
? I think AWK
would be a safer to use in my case, to directly change only the 1-st column of the file.
Upvotes: 0
Views: 629
Reputation: 133538
Could you please try following.
awk -v first="HanXRQChr0" -v second="HanXRQChr" '
$1>=1 && $1<=9{
$1=first $1
}
$1>=10 && $1<=26{
$1=second $1
}
1' Input_file
You could change the variable named first
and second
's values as per your need too. What it will do it will check if first field's value is from 1 to 9 it will prefix variable second
value to it and if first field's value is from 10 to 26 it will prefix first
variable's value in it.
Explanation: Adding explanation too here for code above.
awk -v first="HanXRQChr0" -v second="HanXRQChr" ' ##Creating variable named first and second and you could keep their values as per your need.
$1>=1 && $1<=9{ ##Checking condition when first field is greater than or equal to 1 and less than or equal to 9 here then do following.
$1=first $1 ##Re-creating the first field and adding variable first value before it here.
} ##closing this condition block here.
$1>=10 && $1<=26{ ##Checking condition here if 1st field is greater than or equal to 10 AND lesser than or equal to 26 then do following.
$1=second $1 ##Re-creating first field value and adding variable second value before $1 here.
} ##Closing this condition block here.
1 ##Mentioning 1 will be printing the line here.
' Input_file ##Mentioning Input_file name here.
Upvotes: 2
Reputation: 67507
since you didn't provide sample input, here is a script with mock data
$ seq 1 3 30 | awk '1<=$1 && $1<=26 {$1=sprintf("HanXRQChr%02d",$1)}1'
HanXRQChr01
HanXRQChr04
HanXRQChr07
HanXRQChr10
HanXRQChr13
HanXRQChr16
HanXRQChr19
HanXRQChr22
HanXRQChr25
28
Note that 28 escapes the prefixing logic.
To prevent tab delimiters to converted to spaces add the BEGIN block to the beginning
$ awk 'BEGIN{FS=OFS="\t"} ...
Upvotes: 2