Reputation: 2196
I'm a newbie on Spark and need to parallelizePairs()
(working on Java).
First, I've started my driver with:
SparkSession spark = SparkSession
.builder()
.appName("My App")
.config("driver", "org.postgresql.Driver")
.getOrCreate();
But spark
don't have the function I need. Just parallelize()
thru spark.sparkContext()
Now I'm tempted to add
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("My App");
JavaSparkContext context = new JavaSparkContext(sparkConf);
This way, context have the function I need but I'm very confusing here.
First, I never needed JavaSparkContext
because I'm running using spark-submit
and setting the master address there.
Second, why spark.sparkContext()
is not the same of JavaSparkContext
and how to get it using the SparkSession
?
If I'm passing the master in command line, must I set sparkConf.setMaster( '<master-address-again>' )
?
I already read this: How to create SparkSession from existing SparkContext and undesrtood the problem but I realy need the builder way because I need to pass the .config("driver", "org.postgresql.Driver")
to it.
Please some light here...
EDIT
Dataset<Row> graphDatabaseTable = spark.read()
.format("jdbc")
.option("url", "jdbc:postgresql://192.168.25.103:5432/graphx")
.option("dbtable", "public.select_graphs")
.option("user", "postgres")
.option("password", "admin")
.option("driver", "org.postgresql.Driver")
.load();
SQLContext graphDatabaseContext = graphDatabaseTable.sqlContext();
graphDatabaseTable.createOrReplaceTempView("select_graphs");
String sql = "select * from select_graphs where parameter_id = " + indexParameter;
Dataset<Row> graphs = graphDatabaseContext.sql(sql);
Upvotes: 2
Views: 3257
Reputation: 39
sparkContext = SparkSession.sparkContext();
SparkSession spark = SparkSession
.builder()
.appName("My App")
.config("driver", "org.postgresql.Driver")
.getOrCreate();
sparkContext = spark.sparkContext;
Upvotes: 0
Reputation: 35249
Initialize JavaSparkContext
using existing SparkContext
:
JavaSparkContext context = JavaSparkContext(spark.sparkContext());
why spark.sparkContext() is not the same of JavaSparkContext and how to get it using the SparkSession
In short, because Scala is much richer language than Java and JavaSparkContext
is a convenience wrapper, designed to get around some Java limitations. At the same time RDD API is moved aside.
If I'm passing the master in command line, must I set sparkConf.setMaster( '' )
No. Precedence is:
SparkConf
and SparkContext
options.but I realy need the builder way because I need to pass the .config("driver", "org.postgresql.Driver") to it.
It doesn't look right. driver
option is used by DataFrameWriter
and DataFrameReader
. It should be passed there.
Upvotes: 4