Reputation: 434
I have an excel file (.xlsx) file in the datalake. I need to read that file into a pyspark dataframe. I do no want to use pandas library.
I have installed the crealytics library in my databricks cluster and tried with below code:
dbutils.fs.cp('/path/to/excel/file','/FileStore/tables/',True)
path='/dbfs/FileStore/tables//myfile1.xlsx'
excel_df=spark.read.format("com.crealytics.spark.excel").option("header","true").option("inferSchema","true").load("/FileStore/tables/myfile1.xlsx")
Im getting the below error:
java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Am I missing anything here or any other approach can be tried other than Pandas. Also I need to read multiple sheets in the excel file. Please suggest.
Upvotes: 1
Views: 6732
Reputation: 21
I was getting the same error. Found out the problem was with the package version. I installed the new version 0.13.8 with Scala 2.12 and it's working.
path="/mnt/replacemountpointname/path/filename.xlsx"
df = spark.read.format("com.crealytics.spark.excel").options(header='True', inferSchema='True').load(path)
Link for ref: https://www.youtube.com/watch?v=ib8Zch_4744
Upvotes: 2