Load only first few .XML files (e.g. 10 xmls) from directory containing 100 files in Pyspark dataframe

Question

I want to load the first 10 XML files in each iteration from a directory containing 100 files and remove that XML file that has already read, to another directory.

what I have tried so far in pyspark.

li = ["/mnt/dev/tmp/xml/100_file/M800143.xml","/mnt/dev/tmp/xml/100_file/M8001422.xml"]
df1 = spark.read.format("com.databricks.spark.xml").option("rowTag","Quality").load(li) 
df1.show()

But I am getting an error : IllegalArgumentException: 'path' must be specified for XML data.

Is there is any way to read files after storing the full path of XML files inside the list? Or please suggest another approach.

Load only first few .XML files (e.g. 10 xmls) from directory containing 100 files in Pyspark dataframe

Answers (0)

Related Questions