martin
martin

Reputation: 1185

change json field type while read by pyarrow

I have one field json and in the end I want to convert it to parquet, so I need to change types:

{"SomeDecimal": "44444"}

and code:

import pyarrow as pa
from pyarrow import json
    
schema = pa.schema([
    pa.field('SomeDecimal', pa.float32())
])
    
one = pa.json.read_json('sample.json', parse_options=json.ParseOptions(explicit_schema=schema))

According to documentation, if I understand correctly that should change valueDecimal type from string to float. But it doesn't works.

I got the error:

ArrowInvalid: JSON parse error: Column(/SomeDecimal) changed from number to string in row 0

Why error says that I changed number to string? What is wrong with that? I can't find solution for that error.

Upvotes: 3

Views: 3315

Answers (1)

0x26res
0x26res

Reputation: 13902

Arrow can't interpret "4444" as a float32. It should be 4444, without quote.

One work around would be to load the data as a string and cast it later:

import pyarrow as pa
from pyarrow import json
    
raw_schema = pa.schema([
    pa.field('SomeDecimal', pa.string())
])
processed_schema = pa.schema([
    pa.field('SomeDecimal', pa.float32())
])
    
raw_table = pa.json.read_json('sample.json', parse_options=json.ParseOptions(explicit_schema=raw_schema))
table = raw_table.cast(processed_schema)

Upvotes: 5

Related Questions