Pear
Pear

Reputation: 865

Why is Spark's show() function very slow?

I have

df.select("*").filter(df.itemid==itemid).show()

and that never terminated, however if I do

print df.select("*").filter(df.itemid==itemid)

It prints in less than a second. Why is this?

Upvotes: 4

Views: 15692

Answers (2)

etinika
etinika

Reputation: 27

This usually happens if you dont have enough available memory in computer. free up some memory and try again.

Upvotes: -1

Justin Pihony
Justin Pihony

Reputation: 67115

That's because select and filter are just building up the execution instructions, so they aren't doing anything with the data. Then, when you call show it actually executes those instructions. If it isn't terminating, then I'd review the logs to see if there are any errors or connection issues. Or maybe the dataset is still too large - try only taking 5 to see if that comes back quick.

Upvotes: 3

Related Questions