Reputation: 1412
For the following directory structure
Folder
Sub-Folder1
File1.csv
File2.csv
File3.csv
File4.csv
Sub-Folder2
File1.csv
File2.csv
Sub-Folder3
File1.csv
File2.csv
How can I use read_csv
of Dask
to read all the CSV files in these folders, each into one partition?
Upvotes: 5
Views: 1383
Reputation: 120509
IIUC, you can use:
import dask.dataframe as dd
dfs = dd.read_csv('Folder/**/*.csv')
Ouput:
>>> dfs
Dask DataFrame Structure:
A B C
npartitions=8
int64 int64 int64
... ... ...
... ... ... ...
... ... ...
... ... ...
Dask Name: read-csv, 8 tasks
Upvotes: 4