wanuke
wanuke

Reputation: 41

I can't convert df to parquet by data type error

I'm trying to convert a pandas dataframe to parquet, but I'm getting an error "Exptected bytes, got a 'int' object", 'Conversion failed for column xxxxxxxx with type object') This table in Excel has numbers and strings, it is like dtype 'object', even so it gives error. I've tried df['xxxxxxxx'].astype(str), df['xxxxxxxx'].astype('data_type'), but none of them work. I tried do convert to parquet with AWS Wrangler and Pyarrow

Upvotes: 4

Views: 13087

Answers (5)

ChuongHo
ChuongHo

Reputation: 89

I facing with same issue today and used map to resolve:

df = df.map(str)
df.to_parquet("data.parquet", engine="fastparquet",compression="gzip")

Link : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html

Upvotes: 0

Anant Mulchandani
Anant Mulchandani

Reputation: 11

I got this error while saving my pandas dataframe to paraquet using aws wrangler. This happened in my case when first few rows of a column were of datetime type, and remaining rows below were of sting type. I used this to check for columns that have different datatypes within them.

for c in range(df.shape[1]):
    for i in range(df.shape[0]):
        if(type(df.iloc[0,c])!=type(df.iloc[i,c])): 
            print("difference found in cell ", i,c)
            print("column name =", df.columns[c])
            break
            
# if you get difference for nan types (float) ignore that

Then convert the all the rows of identified columns to one single datatype.

Upvotes: 0

Dranikf
Dranikf

Reputation: 196

I had the same problem. Setting engine='fastparquet' argument for the to_parquet method helped me.

Upvotes: 2

Alejandro Henao
Alejandro Henao

Reputation: 191

As mentioned in this other question

A general type of the column could work. So try:

df['xxxxxxxx'] = df['xxxxxxxx'].astype(str)
df.to_parquet(path)

However, this is not a good practice as this will hide the type error, you should consider fixing the type of the column by separating data or be aware that this columnhas different types. Pandas has a warning included for these type of errors:

   Columns (# of column) have mixed types. Specify dtype option on import or set low_memory=False.

Upvotes: 5

Aravinth Balakrishnan
Aravinth Balakrishnan

Reputation: 96

Did you try :

df['xxxxxxxx'] = df['xxxxxxxx'].astype(bytes)

Upvotes: 1

Related Questions