Databrick pyspark Error While getting Excel data from my Azure Blob Storage

Question

I want to read an excel file with multiple sheets in my Blob storage Azure Gen2 using Databrick pyspark. I already install the maven package. Below my code :

df = spark.read.format('com.crealytics.spark.excel') \
.option("header", "true") \
.option("useHeader", "true") \
.option("treatEmptyValuesAsNulls", "true") \
.option("inferSchema", "true") \
.option("sheetName", "sheet1") \
.option("maxRowsInMemory", 10) \
.load(file_path)

Running this code I get this error:

Py4JJavaError: An error occurred while calling o323.load. : java.lang.NoClassDefFoundError: Could not initialize class com.crealytics.spark.excel.WorkbookReader$ at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:22) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:444) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:400) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:400) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

Any help is appreciated. Thanks

KarthikBhyresh-MT · Accepted Answer

Can you verify if you have properly Mount an Azure Blob storage container.

Checkout official MS doc: Access Azure Blob storage using the RDD API

Hadoop configuration options are not accessible via SparkContext. If you are using the RDD API to read from Azure Blob storage, you must set the Hadoop credential configuration properties as Spark configuration options when you create the cluster, adding the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations that are used for your RDD jobs

Configure an account access key:

spark.hadoop.fs.azure.account.key..blob.core.windows.net

Databrick pyspark Error While getting Excel data from my Azure Blob Storage

Answers (1)

Related Questions