Reputation: 41
I'm trying to convert a pandas dataframe to parquet, but I'm getting an error "Exptected bytes, got a 'int' object", 'Conversion failed for column xxxxxxxx with type object') This table in Excel has numbers and strings, it is like dtype 'object', even so it gives error. I've tried df['xxxxxxxx'].astype(str), df['xxxxxxxx'].astype('data_type'), but none of them work. I tried do convert to parquet with AWS Wrangler and Pyarrow
Upvotes: 4
Views: 13087
Reputation: 89
I facing with same issue today and used map to resolve:
df = df.map(str)
df.to_parquet("data.parquet", engine="fastparquet",compression="gzip")
Link : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html
Upvotes: 0
Reputation: 11
I got this error while saving my pandas dataframe to paraquet using aws wrangler.
This happened in my case when first few rows of a column were of datetime
type, and remaining rows below were of sting type. I used this to check for columns that have different datatypes within them.
for c in range(df.shape[1]):
for i in range(df.shape[0]):
if(type(df.iloc[0,c])!=type(df.iloc[i,c])):
print("difference found in cell ", i,c)
print("column name =", df.columns[c])
break
# if you get difference for nan types (float) ignore that
Then convert the all the rows of identified columns to one single datatype.
Upvotes: 0
Reputation: 196
I had the same problem. Setting engine='fastparquet'
argument for the to_parquet
method helped me.
Upvotes: 2
Reputation: 191
As mentioned in this other question
A general type of the column could work. So try:
df['xxxxxxxx'] = df['xxxxxxxx'].astype(str)
df.to_parquet(path)
However, this is not a good practice as this will hide the type error, you should consider fixing the type of the column by separating data or be aware that this columnhas different types. Pandas has a warning included for these type of errors:
Columns (# of column) have mixed types. Specify dtype option on import or set low_memory=False.
Upvotes: 5
Reputation: 96
Did you try :
df['xxxxxxxx'] = df['xxxxxxxx'].astype(bytes)
Upvotes: 1