Reputation: 679
I need to use com.databricks.spark.xml from a google cloud notebook
tried:
import os
#os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0 pyspark-shell'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.10:0.4.1 pyspark-shell'
articles_df = spark.read.format('xml'). \
options(rootTag='articles', rowTag='article'). \
load('gs://....-20180831.xml', schema=articles_schema)
but I'm getting:
java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at http://spark.apache.org/third-party-projects.xml
Upvotes: 0
Views: 1254