Reputation: 1
I have a xlsx file with 15000 records. I'm trying to serialize the data for a API services. Read the file and send it in HTTP response.
Input data look as below
account_name | dr_code | cr_code |amount | rate | category
A | 12582 | 12582 |5000 |30 | POP
B | 55AG98 | 55AG98 |2000 |40 | POP
C | 5ER0AB | |5000 |2.2 | POP
Code as below
df = pandas.read_excel(file.xlsx, {usecols: [0, 1,4,6,7, 8]})
b = pyarrow.Table.from_pandas(df, preserve_index=True)
I get this error pyarrow.lib.ArrowInvalid : ("Could not convert "55AG98" with type str" : tried to convert to int, 'conversion failed for the column dr_code with type object')
Above code works if column has values with same datatype but error on multiple datatype.
Upvotes: 0
Views: 2419
Reputation: 25220
If you don't want pyarrow to guess what types the result should have, you need to pass a schema when doing this conversion.
E.g.
import pandas
import pyarrow
df = pandas.read_excel(file.xlsx, {usecols: [0, 1,4,6,7, 8]})
schema = pyarrow.schema([
('account_name', pa.string()),
('dr_code', pa.string()),
('cr_code', pa.string()),
('amount', pa.float64()),
('rate', pa.float64()),
('category', pa.string()),
])
b = pyarrow.Table.from_pandas(df, schema=schema, preserve_index=True)
Upvotes: 1