Ayan Biswas
Ayan Biswas

Reputation: 1645

Reading a Excel file in Spark with an integer column

I have a group of Excel sheets, that I am trying to read via spark through com.crealytics.spark.excel package. In my excel sheet I have a column Survey ID that contains integer IDs. When I read the data through spark I see the values are converted to double value.

How can I retain the format of the integer values while reading from excel sheet ?

This is what I tried :

val df = spark.read.format("com.crealytics.spark.excel")
      .option("location", <somelocation>)
      .option("useHeader", "true")
      .option("treatEmptyValuesAsNulls", "true")
      .option("inferSchema", "true")
      .option("addColorColumns","False")
      .load()

Actual Value

enter image description here

Value read via Spark

+-----------+
|  Survey ID|
+-----------+
|1.7632889E7|
|1.7632889E7|
|1.7632934E7|
|1.7633233E7|
|1.7633534E7|
|1.7655812E7|
|1.7656079E7|
|1.7930478E7|
|1.7944498E7|
|1.8071246E7|

If I cast the column to integer I get the required formatted data. But is there a better way to do this?

val finalDf=df.withColumn("Survey ID", col("Survey ID").cast(sql.types.IntegerType))

Upvotes: 1

Views: 1921

Answers (1)

D3V
D3V

Reputation: 1593

There is a bug (or rather missing setting) in the excel library which renders column with large numbers as scientific notation. See https://github.com/crealytics/spark-excel/issues/126

Upvotes: 1

Related Questions