Reputation: 2541
I'm trying to dig into pyspark and find out all the different ways to track metadata of the files read into the spark context. I primarily use Databricks and would like to find out different functions like the ones listed below that would provide me some vital metadata information about my data.
input_file_name()
printSchema()
df.describe().show()
I'm totally new to pyspark and I don't know how to get this kind of information. Is there a way I can get a list of all such metadata functions present in pyspark? Thanks in advance.
Upvotes: 0
Views: 97
Reputation: 12768
These are the different ways to get the information about the metadata.
For the schema of dataset df, you can use df.schema
, df.schema.fields
, df.schema.fieldNames
, df.printSchema()
, and df.describe().show()
,
df.printSchema()
df.describe().show()
df.schema
Even createOrReplaceTempView
gives the schema information.
df.createOrReplaceTempView("storm")
Reference: Pyspark documentation
Upvotes: 1