S_S
S_S

Reputation: 1412

Dask read CSV files recursively from directories

For the following directory structure

Folder
  Sub-Folder1
           File1.csv
           File2.csv
           File3.csv
           File4.csv
  Sub-Folder2
           File1.csv
           File2.csv
  Sub-Folder3
           File1.csv
           File2.csv

How can I use read_csv of Dask to read all the CSV files in these folders, each into one partition?

Upvotes: 5

Views: 1383

Answers (1)

Corralien
Corralien

Reputation: 120509

IIUC, you can use:

import dask.dataframe as dd

dfs = dd.read_csv('Folder/**/*.csv')

Ouput:

>>> dfs
Dask DataFrame Structure:
                   A      B      C
npartitions=8                     
               int64  int64  int64
                 ...    ...    ...
...              ...    ...    ...
                 ...    ...    ...
                 ...    ...    ...
Dask Name: read-csv, 8 tasks

Upvotes: 4

Related Questions