LearneR
LearneR

Reputation: 2541

What are the different pyspark functions (like input_file_name() ) that provide information about the metadata?

I'm trying to dig into pyspark and find out all the different ways to track metadata of the files read into the spark context. I primarily use Databricks and would like to find out different functions like the ones listed below that would provide me some vital metadata information about my data.

input_file_name()
printSchema()
df.describe().show()

I'm totally new to pyspark and I don't know how to get this kind of information. Is there a way I can get a list of all such metadata functions present in pyspark? Thanks in advance.

Upvotes: 0

Views: 97

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

These are the different ways to get the information about the metadata.

For the schema of dataset df, you can use df.schema, df.schema.fields, df.schema.fieldNames, df.printSchema(), and df.describe().show(),

df.printSchema()

enter image description here

df.describe().show()

enter image description here

df.schema

enter image description here

Even createOrReplaceTempView gives the schema information.

df.createOrReplaceTempView("storm")

enter image description here

Reference: Pyspark documentation

Upvotes: 1

Related Questions