nojohnny101
nojohnny101

Reputation: 562

How to serialize an object that can't be serialized?

I need to serialize an object that is saying it can't be serialized. I am using the external library pyarrow, and I specifically am working with ParquetDatasets and schemas of those. I can print the schema and it comes out like this:

stage_dataset: <pyarrow.parquet.ParquetDataset object at 0x7f8ddcc088d0>
stage_dataset_schema: <pyarrow._parquet.ParquetSchema object at 0x7f8ddc287dd0>
machine_id: BYTE_ARRAY String
wkstn_grp: BYTE_ARRAY String
charge_unit: BYTE_ARRAY String
workstation: BYTE_ARRAY String
wstndesc: BYTE_ARRAY String
current_part_no: BYTE_ARRAY String
current_oper_no: BYTE_ARRAY String
laborclass: BYTE_ARRAY String
jobclass: BYTE_ARRAY String
dml_operation: BYTE_ARRAY String

I need all those columns and datatypes into a JSON or dictionary or something. I don't have option of modifying the class to make it serializable as that tech debt I don't want to create. Is there a different class or method I should be using in pyarrow that would allow JSON output?

Upvotes: 0

Views: 360

Answers (1)

py_dude
py_dude

Reputation: 832

  1. You can write your own Serializer to pass it to json.dumps(data={}, cls=Serializer)
  2. You can use https://marshmallow.readthedocs.io/en/stable/ and create your own fields to serialize them correctly (or even a whole schema)

The second variant is more preferable

Upvotes: 1

Related Questions