Khaleesi95
Khaleesi95

Reputation: 311

how to remove chr from vcf file using bash

I need to remove 'chr' from my vcf file. This is the aspect of the vcf file:

#CHROM  POS  
chr1   10570
chr1   10574
chr1   10654

I want to have the following one

#CHROM  POS  
   1   10570
   1   10574
   1   10654

I have tried several ways like the following ones:

awk '{gsub(/^chr/,""); print}' your.vcf > no_chr.vcf
sed 's/^chr//'
sed 's:chr::g'
awk '{gsub(/\chr/, "")}1'
perl -pe  's/^chr//g'
sed '/^##/! s/chr//'

but they don't work...any suggestion? Thank you!

Upvotes: 0

Views: 1103

Answers (5)

rndy
rndy

Reputation: 1

When editing VCF files with awk I've found it easier to specify the column rather than using regex since the first 8 columns of VCF files are fixed (#CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO).

Here's a solution with awk that substitute "chr" with "" in column 1. (I used sub() here rather than gsub() since there's only one instance of "chr" to replace for each line.)

awk '{ sub("chr", "", $1); print }' your.vcf > no_chr.vcf

Note that this code can change your delimiter. By default awk uses whitespace as the input field separator and a single space as the output field separator.

Most VCF files I've worked with are tab-delimited. In order to use tab as the delimiter for both input and output, you need to specify the input field separator (FS) and output field separator (OFS) at the beginning of your code.

Here's the same solution using tab as the field separator:

awk 'BEGIN { FS = OFS = "\t" } { sub("chr", "", $1); print }' your.vcf > no_chr.vcf

Upvotes: 0

user438383
user438383

Reputation: 6206

For beginners it is much better to use dedicated tools rather than unix tools. It's easy to end up messing up your file.

echo "chr1 1" >> rename_chrs.txt
bcftools annotate --rename-chrs rename_chrs.txt in.vcf > out.vcf

Upvotes: 4

Timur Shtatland
Timur Shtatland

Reputation: 12347

Use this Perl one-liner:

perl -i.bak -pe  's/^chr//' your.vcf

And if you want to remove all chr anywhere in the line:

perl -i.bak -pe  's/chr//g' your.vcf

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

s/^chr// : Replace chr at the beginning of the string (here, the line) with an empty string. There is no need to use the g modifier (match the pattern repeatedly), since there is only one replacement per line.

See also:


Complete example with input and output:

Create test input:

cat > your.vcf <<EOF
#CHROM  POS  
chr1   10570
chr1   10574
chr1   10654
EOF

Confirm using cat and hexdump that there are no special characters:

cat your.vcf

Prints:

#CHROM  POS  
chr1   10570
chr1   10574
chr1   10654
hexdump -C your.vcf

Prints:

00000000  23 43 48 52 4f 4d 20 20  50 4f 53 20 20 0a 63 68  |#CHROM  POS  .ch|
00000010  72 31 20 20 20 31 30 35  37 30 0a 63 68 72 31 20  |r1   10570.chr1 |
00000020  20 20 31 30 35 37 34 0a  63 68 72 31 20 20 20 31  |  10574.chr1   1|
00000030  30 36 35 34 0a                                    |0654.|
00000035

Remove chr:

perl -i.bak -pe  's/^chr//' your.vcf

Check the file:

cat your.vcf

Prints:

#CHROM  POS  
1   10570
1   10574
1   10654

Upvotes: 1

sseLtaH
sseLtaH

Reputation: 11227

Using sed

$ sed -E '/^#/! {:a;s/[a-z]([0-9])?/ \1/;ta}' input_file
#CHROM  POS
   1   10570
   1   10574
   1   10654

Upvotes: 0

Barmar
Barmar

Reputation: 781141

Replace it with 3 spaces.

sed 's/^chr/   /' your.vcf > no_chr.vcf

Upvotes: 1

Related Questions