Reputation: 865
I have
df.select("*").filter(df.itemid==itemid).show()
and that never terminated, however if I do
print df.select("*").filter(df.itemid==itemid)
It prints in less than a second. Why is this?
Upvotes: 4
Views: 15692
Reputation: 27
This usually happens if you dont have enough available memory in computer. free up some memory and try again.
Upvotes: -1
Reputation: 67115
That's because select
and filter
are just building up the execution instructions, so they aren't doing anything with the data. Then, when you call show
it actually executes those instructions. If it isn't terminating, then I'd review the logs to see if there are any errors or connection issues. Or maybe the dataset is still too large - try only taking 5 to see if that comes back quick.
Upvotes: 3