Reputation: 5139
as titled, how do I know which version of spark has been installed in the CentOS?
The current system has installed cdh5.1.0.
Upvotes: 89
Views: 166463
Reputation: 21
Try this way:
import util.Properties.versionString
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder
.appName("my_app")
.master("local[6]")
.getOrCreate()
println("Spark Version: " + spark.version)
println("Scala Version: " + versionString)
Upvotes: 2
Reputation: 586
If like me, one is running spark inside a docker container and has little means for the spark-shell, one can run jupyter notebook, build SparkContext
object called sc
in the jupyter notebook, and call the version as shown in the codes below:
docker run -p 8888:8888 jupyter/pyspark-notebook ##in the shell where docker is installed
import pyspark
sc = pyspark.SparkContext('local[*]')
sc.version
Upvotes: 1
Reputation: 35404
use the
spark.version
Where spark
variable is of SparkSession
object
spark-shell
[root@bdhost001 ~]$ spark-shell
Setting the default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
spark-shell --version
[root@bdhost001 ~]$ spark-shell --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Type --help for more information.
spark-submit --version
[root@bdhost001 ~]$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Type --help for more information.
Upvotes: 34
Reputation: 5648
Non-interactive way, that I am using for AWS EMR proper PySpark version installation:
# pip3 install pyspark==$(spark-submit --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}")
Collecting pyspark==2.4.4
solution:
# spark-shell --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}"
2.4.4
solution:
# spark-submit --version 2>&1| grep -m 1 -Eo "([0-9]{1,}\.)+[0-9]{1,}"
2.4.4
Upvotes: -1
Reputation: 442
If you want to print the version programmatically use
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").getOrCreate()
print(spark.sparkContext.version)
Upvotes: 5
Reputation: 9044
Most of the answers here requires initializing a sparksession. This answer provide a way to statically infer the version from library.
ammonites@ org.apache.spark.SPARK_VERSION
res4: String = "2.4.5"
Upvotes: 1
Reputation: 426
If you want to run it programatically using python
script
You can use this script.py
:
from pyspark.context import SparkContext
from pyspark import SQLContext, SparkConf
sc_conf = SparkConf()
sc = SparkContext(conf=sc_conf)
print(sc.version)
run it with python script.py
or python3 script.py
This above script is also works on python shell.
Using print(sc.version)
directly on the python script won't work. If you run it directly, you will get this error:NameError: name 'sc' is not defined
.
Upvotes: 2
Reputation: 9
In order to print the Spark's version on the shell, following solution work.
SPARK_VERSION=$(spark-shell --version &> tmp.data ; grep version tmp.data | head -1 | awk '{print $NF}';rm tmp.data)
echo $SPARK_VERSION
Upvotes: -1
Reputation: 4797
If you are on Zeppelin notebook you can run:
sc.version
to know the scala version as well you can ran:
util.Properties.versionString
Upvotes: 4
Reputation: 502
If you are using pyspark, the spark version being used can be seen beside the bold Spark logo as shown below:
manoj@hadoop-host:~$ pyspark
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)
SparkContext available as sc, HiveContext available as sqlContext.
>>>
If you want to get the spark version explicitly, you can use version method of SparkContext as shown below:
>>>
>>> sc.version
u'1.6.0'
>>>
Upvotes: 5
Reputation: 737
If you are using Databricks and talking to a notebook, just run :
spark.version
Upvotes: 16
Reputation: 188
Which ever shell command you use either spark-shell or pyspark, it will land on a Spark Logo with a version name beside it.
$ pyspark
$ Python 2.6.6 (r266:84292, May 22 2015, 08:34:51)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-15)] on linux2
............
...........
Welcome to
version 1.3.0
Upvotes: 4
Reputation: 8851
If you use Spark-Shell, it appears in the banner at the start.
Programatically, SparkContext.version
can be used.
Upvotes: 110