Maneesh Potti
Maneesh Potti

Reputation: 21

What is the difference between dataframe.show() and dataframe.take() in spark? To increase the performance what we need to increase?

I am using dataframes to read data from parquet files and creating a temporary view and running SQL queries on top the temp views.

spark.read.parquet("filename.parquet").createOrReplaceTempView("temptable")

val df = spark.sql("SELECT * FROM temptable")

to check the result of df i am using df.show() but it takes more to execute and I did not see any difference if I use df.take(10)

IS there any difference between take() and show().which method should I use for better performance to check the results

Upvotes: 2

Views: 7852

Answers (1)

Michael West
Michael West

Reputation: 1716

take() and show() are different. show() prints results, take() returns a list of rows (in PySpark) and can be used to create a new dataframe. They are both actions.

Print results

df.show() 

Get list of rows (PySpark)

sampleList = df.take(10)

Upvotes: 13

Related Questions