Reputation: 2109
I am using Spark 1.6.0 on three VMs, 1x Master (standalone), 2x workers w/ 8G RAM, 2CPU each.
I am using the kernel configuration below:
{
"display_name": "PySpark ",
"language": "python3",
"argv": [
"/usr/bin/python3",
"-m",
"IPython.kernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "<mypath>/spark-1.6.0",
"PYTHONSTARTUP": "<mypath>/spark-1.6.0/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master spark://<mymaster>:7077 --conf spark.executor.memory=2G pyspark-shell --driver-class-path /opt/vertica/java/lib/vertica-jdbc.jar"
}
}
Currently, this works. I can use spark context sc
& sqlContext
without import, as in pyspark shell.
Problem comes when I use multiple notebooks: On my spark master I see two 'pyspark-shell' apps, which kinda make sense, but only one can run at a time. But here, 'running' does not mean executing anything, even when I do not run anything on a notebook, this will be shown as 'running'. Given this, I can't share my resources between notebooks, which is quite sad (i currently have to kill the first shell (= notebook kernel) to run the second).
If you have any ideas about how to do it, tell me! Also, I'm not sure if the way i'm working with kernels is 'best practice', i already had trouble just setting spark & jupyter to work together.
Thx all
Upvotes: 8
Views: 2464
Reputation: 61
The problem is the database used by Spark to store metastore (Derby). Derby is a light weight database system and can only run one Spark instance at a time. The solution is to setup another database system to deal with multi instances (postgres, mysql...).
For example, you can use postgres DB.
Example on a linux shell:
# download postgres jar
wget https://jdbc.postgresql.org/download/postgresql-42.1.4.jar
# install postgres on your machine
pip install postgres
# add user, pass and db to postgres
psql -d postgres -c "create user hive"
psql -d postgres -c "alter user hive with password 'pass'"
psql -d postgres -c "create database hive_metastore"
psql -d postgres -c "grant all privileges on database hive_metastore to hive"
hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/hive_metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>pass</value>
</property>
</configuration>
Upvotes: 1