Reputation: 21
I am using dataframes to read data from parquet files and creating a temporary view and running SQL queries on top the temp views.
spark.read.parquet("filename.parquet").createOrReplaceTempView("temptable")
val df = spark.sql("SELECT * FROM temptable")
to check the result of df
i am using df.show()
but it takes more to execute and I did not see any difference if I use df.take(10)
IS there any difference between take()
and show()
.which method should I use for better performance to check the results
Upvotes: 2
Views: 7852
Reputation: 1716
take()
and show()
are different. show()
prints results, take()
returns a list of rows (in PySpark) and can be used to create a new dataframe. They are both actions.
Print results
df.show()
Get list of rows (PySpark)
sampleList = df.take(10)
Upvotes: 13