Reputation: 1185
I have one field json and in the end I want to convert it to parquet, so I need to change types:
{"SomeDecimal": "44444"}
and code:
import pyarrow as pa
from pyarrow import json
schema = pa.schema([
pa.field('SomeDecimal', pa.float32())
])
one = pa.json.read_json('sample.json', parse_options=json.ParseOptions(explicit_schema=schema))
According to documentation, if I understand correctly that should change valueDecimal type from string to float. But it doesn't works.
I got the error:
ArrowInvalid: JSON parse error: Column(/SomeDecimal) changed from number to string in row 0
Why error says that I changed number to string? What is wrong with that? I can't find solution for that error.
Upvotes: 3
Views: 3315
Reputation: 13902
Arrow can't interpret "4444"
as a float32. It should be 4444
, without quote.
One work around would be to load the data as a string and cast it later:
import pyarrow as pa
from pyarrow import json
raw_schema = pa.schema([
pa.field('SomeDecimal', pa.string())
])
processed_schema = pa.schema([
pa.field('SomeDecimal', pa.float32())
])
raw_table = pa.json.read_json('sample.json', parse_options=json.ParseOptions(explicit_schema=raw_schema))
table = raw_table.cast(processed_schema)
Upvotes: 5