Reading files into a pyspark dataframe from directories and subdirectories

Question

I have the below to read all files within a directory, but I am struggling with getting the subdirectories too. I won't always know what the subdirectories are and hence cannot explicitly define it

Can anyone advise me please?

df = my_spark.read.format("csv").option("header", "true").load(yesterday+"/*.csv")

DataWrangler · Accepted Answer

Use Wildcards after the directory location where you wish to read all the sub directories.

"path/*/*"

Reading files into a pyspark dataframe from directories and subdirectories

Answers (2)

Related Questions