kikee1222
kikee1222

Reputation: 1996

Reading files into a pyspark dataframe from directories and subdirectories

I have the below to read all files within a directory, but I am struggling with getting the subdirectories too. I won't always know what the subdirectories are and hence cannot explicitly define it

Can anyone advise me please?

df = my_spark.read.format("csv").option("header", "true").load(yesterday+"/*.csv")

Upvotes: 1

Views: 2220

Answers (2)

DataWrangler
DataWrangler

Reputation: 2165

Use Wildcards after the directory location where you wish to read all the sub directories.

"path/*/*"

Upvotes: 2

kikee1222
kikee1222

Reputation: 1996

Thanks to Joby

can you try giving wildcards in this way and see "path//" – Joby 23 hours ago

Upvotes: 0

Related Questions