Reputation: 804
I have a head node comprising of hadoop cluster.I see that pyspark is installed in hdfs cluster,i.e i am able to use pyspark shell inside hdfs user.But in user headnode pyspark is not installed. Therefore I am not able to access files from hdfs and bring it to pyspark.How can I use the pyspark inside hdfs in jupyter notebook.I installed pyspark in user head node but I am not able to access hdfs files.I am assuming that the jupyter is not able to use the spark which is installed in hdfs.How am I to enable it so that I can access hdfs files inside jupyter.
Now when I access hdfs files inside jupyter,
It says 'Spark is not installed'
I know its broad,If I have under emphasised or over emphsasied any point let me know in the comments
Upvotes: 0
Views: 95
Reputation: 77
is the headnode a different linux account or is it a different linux host?
if it is just different account - then compare the environment variables on both accounts. login to hdfs and run "env|sort" and then do the same on headnode.
Check mainly if there are differences in the environment variables PATH, and some SPARK variables
Upvotes: 0