Reputation: 21
I am trying to run a tool ("TAPES") for ACMG-based variant prioritization of a vep-annotated VCF file. The tool works perfect with the toy_dataset that contains another vep-annotated vcf file (same format) and gives the result. Although, I am getting this error when I am trying the same on another file-
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 4209: invalid continuation byte
Could someone please help me with this error?
Also, if there are any other ACMG-criteria assignment tools/ variant prioritization tools for vep-annotated vcf?
Thanks
I have tried including the encoding type in pd.open_csv command. I have also tried using codecs and other methods for automatic detection of encoding type using chardet. I have tried the encoding types - ascii, utf-8. latin-1, ISO-8859-1
import pandas as pd
df = pd.read_csv('file.csv', encoding='ISO-8859-1')
result = chardet.detect(f.read())
df = pd.read_csv('file.csv', encoding=result['encoding'])
with codecs.open('file.csv', 'r', encoding='ISO-8859-1') as f:
df = pd.read_csv(f)
Upvotes: 2
Views: 20