Reputation: 21
Hello I am trying to read a csv file. This was my code:
df = pd.read_csv("2021VAERSDATA.csv")
df.head()
and this was the error I received:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas\_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 136: invalid start byte
I'm not sure how to correct this. Any advice would be greatly appreciated!
Edit:
Here are the first 3 rows of my file:
VAERS_ID | RECVDATE | STATE | AGE_YRS | CAGE_YR | CAGE_MO | SEX | RPT_DATE | SYMPTOM_TEXT | DIED | DATEDIED | L_THREAT | ER_VISIT | HOSPITAL | HOSPDAYS | X_STAY | DISABLE | RECOVD | VAX_DATE | ONSET_DATE | NUMDAYS | LAB_DATA | V_ADMINBY | V_FUNDBY | OTHER_MEDS | CUR_ILL | HISTORY | PRIOR_VAX | SPLTTYPE | FORM_VERS | TODAYS_DATE | BIRTH_DEFECT | OFC_VISIT | ER_ED_VISIT | ALLERGIES |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
916600 | 1/1/2021 | TX | 33 | 33 | F | Right of epiglottis swelled up and hinder swallowing pictures taken Benadryl Tylenol taken | Y | 12/28/2020 | 12/30/2020 | 2 | None | PVT | None | None | None | 2 | 1/1/2021 | Y | Pcn and bee venom | |||||||||||||||
916601 | 1/1/2021 | CA | 73 | 73 | F | Approximately 30 min post vaccination administration patient demonstrated SOB and anxiousness. Assessed at time of event: Heart sounds normal, Lung sounds clear. Vitals within normal limits for patient. O2 91% on 3 liters NC Continuous flow. 2 consecutive nebulized albuterol treatments were administered. At approximately 1.5 hours post reaction, patients' SOB and anxiousness had subsided and the patient stated that they were feel "much better". | Y | 12/31/2020 | 12/31/2020 | 0 | SEN | Patient residing at nursing facility. See patients chart. | Patient residing at nursing facility. See patients chart. | Patient residing at nursing facility. See patients chart. | 2 | 1/1/2021 | Y | "Dairy" |
Upvotes: 2
Views: 3647
Reputation: 1689
I accidentally faced the same issue while trying to load the same dataset. The code below should solve your problem.
df = pd.read_csv("2021VAERSDATA.csv", encoding_errors='ignore', low_memory=False)
df.head()
Upvotes: 1
Reputation: 12684
This is how I read my csv file, so please try it and let me know if it works.
with open('file.csv', encoding="utf8") as csv_file:
df = pd.read_csv(csv_file)
df.head()
If you use open(file) then it will be treated as bytes and no decoding will happen.
EDITED:
Try the following encoding values: encoding='cp1252' or encoding='utf-16' or encoding='ISO-8859-1'
Or last resort is ignore the error
with open('file.csv', encoding="utf8", errors='ignore') as csv_file:
df = pd.read_csv(csv_file)
df.head()
Upvotes: 0