Fasty
Fasty

Reputation: 804

How can I access pyspark installed inside hdfs headnode cluster

I have a head node comprising of hadoop cluster.I see that pyspark is installed in hdfs cluster,i.e i am able to use pyspark shell inside hdfs user.But in user headnode pyspark is not installed. Therefore I am not able to access files from hdfs and bring it to pyspark.How can I use the pyspark inside hdfs in jupyter notebook.I installed pyspark in user head node but I am not able to access hdfs files.I am assuming that the jupyter is not able to use the spark which is installed in hdfs.How am I to enable it so that I can access hdfs files inside jupyter.

Now when I access hdfs files inside jupyter,

It says 'Spark is not installed'

I know its broad,If I have under emphasised or over emphsasied any point let me know in the comments

Upvotes: 0

Views: 95

Answers (1)

Doron Veeder
Doron Veeder

Reputation: 77

is the headnode a different linux account or is it a different linux host?

if it is just different account - then compare the environment variables on both accounts. login to hdfs and run "env|sort" and then do the same on headnode.

Check mainly if there are differences in the environment variables PATH, and some SPARK variables

Upvotes: 0

Related Questions