reading multiple files into dask dataframe

Question

I want to read multiple csv files into one single dask dataframe. Due to some reasons some portion of my original data get lost (no clue why?!). I am wondering whats the best method to read them all into dask? I used a for loop though not sure if its correct.

 for file in os.listdir(dds_glob):
    if file.endswith('issued_processed.txt'):
        ddf = dd.read_fwf(os.path.join(dds_glob,file),
                          colspecs=cols,
                          header=None,
                          dtype=object,
                          names=names)

or should I use something like this:

dfs = delayed(pd.read_fwf)('/data/input/*issued_processed.txt',
                           colspecs=cols,
                           header=None,
                           dtype=object,
                           names=names)  
ddf = dd.from_delayed(dfs)

reading multiple files into dask dataframe

Answers (1)

Related Questions