LLMA
LLMA

Reputation: 21

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 136: invalid start byte

Hello I am trying to read a csv file. This was my code:

df = pd.read_csv("2021VAERSDATA.csv")

df.head()

and this was the error I received:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas\_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 136: invalid start byte

I'm not sure how to correct this. Any advice would be greatly appreciated!

Edit:

Here are the first 3 rows of my file:

VAERS_ID RECVDATE STATE AGE_YRS CAGE_YR CAGE_MO SEX RPT_DATE SYMPTOM_TEXT DIED DATEDIED L_THREAT ER_VISIT HOSPITAL HOSPDAYS X_STAY DISABLE RECOVD VAX_DATE ONSET_DATE NUMDAYS LAB_DATA V_ADMINBY V_FUNDBY OTHER_MEDS CUR_ILL HISTORY PRIOR_VAX SPLTTYPE FORM_VERS TODAYS_DATE BIRTH_DEFECT OFC_VISIT ER_ED_VISIT ALLERGIES
916600 1/1/2021 TX 33 33 F Right of epiglottis swelled up and hinder swallowing pictures taken Benadryl Tylenol taken Y 12/28/2020 12/30/2020 2 None PVT None None None 2 1/1/2021 Y Pcn and bee venom
916601 1/1/2021 CA 73 73 F Approximately 30 min post vaccination administration patient demonstrated SOB and anxiousness. Assessed at time of event: Heart sounds normal, Lung sounds clear. Vitals within normal limits for patient. O2 91% on 3 liters NC Continuous flow. 2 consecutive nebulized albuterol treatments were administered. At approximately 1.5 hours post reaction, patients' SOB and anxiousness had subsided and the patient stated that they were feel "much better". Y 12/31/2020 12/31/2020 0 SEN Patient residing at nursing facility. See patients chart. Patient residing at nursing facility. See patients chart. Patient residing at nursing facility. See patients chart. 2 1/1/2021 Y "Dairy"

Upvotes: 2

Views: 3647

Answers (2)

Esraa Abdelmaksoud
Esraa Abdelmaksoud

Reputation: 1689

I accidentally faced the same issue while trying to load the same dataset. The code below should solve your problem.

df = pd.read_csv("2021VAERSDATA.csv", encoding_errors='ignore', low_memory=False)
df.head()

Upvotes: 1

jose_bacoy
jose_bacoy

Reputation: 12684

This is how I read my csv file, so please try it and let me know if it works.

with open('file.csv', encoding="utf8") as csv_file:
    df = pd.read_csv(csv_file)
df.head()

If you use open(file) then it will be treated as bytes and no decoding will happen.

EDITED:

Try the following encoding values: encoding='cp1252' or encoding='utf-16' or encoding='ISO-8859-1'

Or last resort is ignore the error

with open('file.csv', encoding="utf8", errors='ignore') as csv_file:
    df = pd.read_csv(csv_file)
df.head()

Upvotes: 0

Related Questions