MJ Tsai
MJ Tsai

Reputation: 171

How to config Java Spark sparksession samplesize

I am new to Java Spark.

I am currently have issue with Mongodb ETL to hive, that could cause the field have different data type. So that I want to increase the sample size but I only see examples of scala while I am using Java, does anyone know if I setup to increase samplesize properly?

SparkSession spark = SparkSession.builder()
                .master("local[2]")
                .appName("SparkReadMgToHive")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
                .config("sampleSize", 50000)
                .enableHiveSupport()
                .getOrCreate();

many thanks

Upvotes: 0

Views: 3039

Answers (1)

bottaio
bottaio

Reputation: 5093

It's spark.mongodb.input.sampleSize

SparkSession spark = SparkSession.builder()
                .master("local[2]")
                .appName("SparkReadMgToHive")
                .config("spark.sql.warehouse.dir", warehouseLocation)
                .config("spark.mongodb.input.uri", "mongodb://localhost:27017/test.testcollection")
                .config("spark.mongodb.input.sampleSize", 50000)
                .enableHiveSupport()
                .getOrCreate();

Upvotes: 2

Related Questions