Reputation: 171
I am new to Java Spark.
I am currently have issue with Mongodb ETL to hive, that could cause the field have different data type. So that I want to increase the sample size but I only see examples of scala while I am using Java, does anyone know if I setup to increase samplesize properly?
SparkSession spark = SparkSession.builder()
.master("local[2]")
.appName("SparkReadMgToHive")
.config("spark.sql.warehouse.dir", warehouseLocation)
.config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
.config("sampleSize", 50000)
.enableHiveSupport()
.getOrCreate();
many thanks
Upvotes: 0
Views: 3039
Reputation: 5093
It's spark.mongodb.input.sampleSize
SparkSession spark = SparkSession.builder()
.master("local[2]")
.appName("SparkReadMgToHive")
.config("spark.sql.warehouse.dir", warehouseLocation)
.config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
.config("spark.mongodb.input.sampleSize", 50000)
.enableHiveSupport()
.getOrCreate();
Upvotes: 2