Will
Will

Reputation: 11500

How do you access the schema's metadata in pyspark?

Say you have a schema setup like this:

from pyspark.sql.types import StructField, StructType, IntegerType, StringType

schema = StructType([
    StructField(name='a_field', dataType=IntegerType(), nullable=False, metadata={'a': 'b'}),
    StructField(name='b_field', dataType=StringType(), nullable=True, metadata={'c': 'd'})
])

How would you access the metadata?

Upvotes: 0

Views: 4366

Answers (1)

Will
Will

Reputation: 11500

You can see the schema structure with:

>>>schema.json()
'{"fields":[{"metadata":{"a":"b"},"name":"a_field","nullable":false,"type":"integer"},
            {"metadata":{"c":"d"},"name":"b_field","nullable":true,"type":"string"}],
  "type":"struct"}'

To access the metadata, just go through the fields and and access the metadata (a dict)

>>>schema.fields[0].metadata['a']
'b'

>>> schema.fields[1].metadata['c']
'd'

Upvotes: 2

Related Questions