veg2020
veg2020

Reputation: 1020

Row-wise concatenation of hundreds of csv files into single dataframe

I have hundreds of csv files - each corresponding to a unique chemical. All the csv files have the same format (of 3 columns and values within the columns for each chemical).

I would like to combine all these files via a row-wise concatenation into a single pandas dataframe but not have the header columns from each csv file in the final dataframe. I am using the following Python code but getting an error from the pd.read_csv section -

"EmptyDataError: No columns to parse from file"

Code follows below.

files_path=r"C:\Users\Desktop\Python\RWE_350files_merge\Drugs"
csvfiles=glob.glob(os.path.join(files_path, "*.csv"))

master_df = []  
for file in csvfiles:
    df = pd.read_csv(file, header = 0)
    master_df.append(df)

result = pd.concat(master_df, ignore_index=True)

I know the csv files are not "empty" as I can concatenate them from the command line successfully - however, this method retains the header columns from each csv file into the final "concatenated" dataframe - so this is not really acceptable.

How can I fix this issue?

Upvotes: 0

Views: 1328

Answers (1)

J_H
J_H

Reputation: 20450

This looks fine:

    df = pd.read_csv(file, header=0)

But apparently some of your input files are empty. Adding in a print(file) debug statement would help you to focus on particular ones that are empty.

You could Look Before You Leap:

    threshold = 2
    if os.path.getsize(file) > threshold:
        df = pd.read_csv(file, header=0)
        master_df.append(df)

Or you could decide that it is Easier To Ask Forgiveness Than Permission:

    try:
        df = pd.read...
    except pandas.errors.EmptyDataError:
        print(file, 'was empty. Continuing...')

Upvotes: 2

Related Questions