M.Phillips
M.Phillips

Reputation: 1

How to deal with trailing garbage caused from unzipping the hs37d5 fastq file

I have really tried to sort this problem out but it seems like no one else faced this problem before. I unzipped the fastq file from the 1000G:

gunzip -c **hs37d5.fa.gz** | awk '{if(NR%4==1) {printf(">%s\n",substr($0,2));} else if(NR%4==2) print;}' > ref.fa

The unzipped folder though, has some "trailing garbage" and it causes the following error:

"Exception in thread "main" picard.PicardException: Sequence name appears more than once in reference: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"

when trying to run:

java -jar picard.jar CreateSequenceDictionary R=ref.fasta O=ref.dict

If someone could give me a little help, it would be much appreciated.

Upvotes: 0

Views: 817

Answers (0)

Related Questions