What are the different pyspark functions (like input_file_name() ) that provide information about the metadata?

Question

I'm trying to dig into pyspark and find out all the different ways to track metadata of the files read into the spark context. I primarily use Databricks and would like to find out different functions like the ones listed below that would provide me some vital metadata information about my data.

input_file_name()
printSchema()
df.describe().show()

I'm totally new to pyspark and I don't know how to get this kind of information. Is there a way I can get a list of all such metadata functions present in pyspark? Thanks in advance.

CHEEKATLAPRADEEP · Accepted Answer

These are the different ways to get the information about the metadata.

For the schema of dataset df, you can use df.schema, df.schema.fields, df.schema.fieldNames, df.printSchema(), and df.describe().show(),

df.printSchema()

df.describe().show()

df.schema

Even createOrReplaceTempView gives the schema information.

df.createOrReplaceTempView("storm")

Reference: Pyspark documentation

What are the different pyspark functions (like input_file_name() ) that provide information about the metadata?

Answers (1)

Related Questions