Reputation: 1
I have really tried to sort this problem out but it seems like no one else faced this problem before. I unzipped the fastq file from the 1000G:
gunzip -c **hs37d5.fa.gz** | awk '{if(NR%4==1) {printf(">%s\n",substr($0,2));} else if(NR%4==2) print;}' > ref.fa
The unzipped folder though, has some "trailing garbage" and it causes the following error:
"Exception in thread "main" picard.PicardException: Sequence name appears more than once in reference: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"
when trying to run:
java -jar picard.jar CreateSequenceDictionary R=ref.fasta O=ref.dict
If someone could give me a little help, it would be much appreciated.
Upvotes: 0
Views: 817