Reputation: 1645
I have a group of Excel sheets, that I am trying to read via spark through com.crealytics.spark.excel package. In my excel sheet I have a column Survey ID that contains integer IDs. When I read the data through spark I see the values are converted to double value.
How can I retain the format of the integer values while reading from excel sheet ?
This is what I tried :
val df = spark.read.format("com.crealytics.spark.excel")
.option("location", <somelocation>)
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns","False")
.load()
Actual Value
Value read via Spark
+-----------+
| Survey ID|
+-----------+
|1.7632889E7|
|1.7632889E7|
|1.7632934E7|
|1.7633233E7|
|1.7633534E7|
|1.7655812E7|
|1.7656079E7|
|1.7930478E7|
|1.7944498E7|
|1.8071246E7|
If I cast the column to integer I get the required formatted data. But is there a better way to do this?
val finalDf=df.withColumn("Survey ID", col("Survey ID").cast(sql.types.IntegerType))
Upvotes: 1
Views: 1921
Reputation: 1593
There is a bug (or rather missing setting) in the excel library which renders column with large numbers as scientific notation. See https://github.com/crealytics/spark-excel/issues/126
Upvotes: 1