Reputation: 994
I have a simple JSON as below, the value node sometimes will have STRING, and sometime will have DOUBLE. I want to treat the value as STRING as it comes. But when spark see that tag is double its converting to a different format with E
Input JSON
{"key" : "k1", "value": "86093351508521808.0"}
{"key" : "k2", "value": 86093351508521808.0}
Spark output CSV
k1,86093351508521808.0
k2,8.6093351508521808E16
Expected output
k1,86093351508521808.0
k2,86093351508521808.0
Please advise on how the desired output could be achieved. We never read the value in the tag, so we will never be aware of the precision and other details.
Below is the sample code
public static void main(String[] args) {
SparkSession sparkSession = SparkSession
.builder()
.appName(TestSpark.class.getName())
.master("local[*]").getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
System.out.println("Spark context established");
List<StructField> kvFields = new ArrayList<>();
kvFields.add(DataTypes.createStructField("key", DataTypes.StringType, true));
kvFields.add(DataTypes.createStructField("value", DataTypes.StringType, true));
StructType employeeSchema = DataTypes.createStructType(kvFields);
Dataset<Row> dataset = sparkSession.read()
.option("inferSchema", false)
.format("json")
.schema(employeeSchema)
.load("D:\\dev\\workspace\\java\\simple-kafka\\key_value.json");
dataset.createOrReplaceTempView("sourceView");
sqlCtx.sql("select * from sourceView ")
.write()
.format("csv")
.save("D:\\dev\\workspace\\java\\simple-kafka\\output\\" + UUID.randomUUID().toString());
sparkSession.close();
}
Upvotes: 0
Views: 1212
Reputation: 423
We can cast that column to DecimalType as follow:
scala> import org.apache.spark.sql.types.DecimalType;
import org.apache.spark.sql.types.DecimalType
scala> spark.read.json(sc.parallelize(Seq("""{"key" : "k1", "value": "86093351508521808.0"}""","""{"key" : "k2", "value": 86093351508521808.0}"""))).select(col("value").cast(DecimalType(28, 1))).show
+-------------------+
| value|
+-------------------+
|86093351508521808.0|
|86093351508521808.0|
+-------------------+
Upvotes: 1