Reputation: 399
I am using Apache Pyspark with Jupyter notebook. In one of the machine learning tutorials, the instructors were using seaborn with pyspark. How can we install and use third party libraries like Seaborn on the Apache Spark (rather Pyspark)?
Upvotes: 0
Views: 1136
Reputation: 703
Generally, for plotting, you need to move all the data points to the master node (using functions like collect() ) before you can plot. PLotting is not possible while the data is still distributed in memory.
Upvotes: 4