Taukheer
Taukheer

Reputation: 1201

Python - Copying csv files to Dataframe (but skip sub-folders)

I am using the below code to read a set of csv files from a folder to a Dataframe. However this folder has a sub-folder along with these csv files. How could I skip the sub-folder and only read the csv file. The below code throws an error when I try to run this folder that has a sub-folder.

import pandas as pd
import glob
import numpy as np
import os
import datetime
import time

path = r'/Users/user/desktop/Sales/'


allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df

Error message : IsADirectoryError: [Errno 21] Is a directory: 
'/Users/user/desktop/Sales/2018-05-03/[email protected] 
190982.csv-1525305907670.csv'

Could anyone assist on this. Thanks

EDIT: The issue is the subdirectory has the extension '.csv' present in the subdirectory name.

EDIT in code

path =r'/Users/user/desktop/Sales/2018-05-03/'
files_only = [file for file in 
glob.glob('/Users/user/desktop/Sales/2018-05-03/*.csv') if not 
os.path.isdir(file)]
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(files_only,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df['filename'] = os.path.basename(csv)
sale_df.append(frame)
sale_df

Get the below error

ValueError: No objects to concatenate

Could you please assist. Thanks..

Upvotes: 1

Views: 410

Answers (2)

tda
tda

Reputation: 2133

My suggestion uses glob.glob to get a list of all matching files/directories that match the specified string, then uses the os module to check each matching file/directory to make sure it is a file. It returns a list of ONLY files that match the glob.glob().

import glob
import os

files_only = [file for file in glob.glob('/path/to/files/*.ext') if not os.path.isdir(file)]

You can then use the files_only list in your read_csv loop.

So in your code:

files_only = [file for file in glob.glob('/Users/user/desktop/Sales/2018-05-03/*.csv') if not os.path.isdir(file)]
frame = pd.DataFrame()
list_ = []
for file in files_only:
    df = pd.read_csv(file,index_col=None, header=0)
    list_.append(df)
sale_df = pd.concat(list_)
sale_df['filename'] = os.path.basename(csv)
sale_df.append(frame)
sale_df

Upvotes: 2

You call allFiles = glob.glob(path + "/*.csv"), even when your path variable ends with a forward slash. That way, it ends up as allFiles = glob.glob("/Users/user/desktop/Sales//*.csv").
See if fixing that helps with your error.

Upvotes: 0

Related Questions