Reputation: 1020
I have hundreds of csv files - each corresponding to a unique chemical. All the csv files have the same format (of 3 columns and values within the columns for each chemical).
I would like to combine all these files via a row-wise concatenation into a single pandas dataframe but not have the header columns from each csv file in the final dataframe. I am using the following Python code but getting an error from the pd.read_csv section -
"EmptyDataError: No columns to parse from file"
Code follows below.
files_path=r"C:\Users\Desktop\Python\RWE_350files_merge\Drugs"
csvfiles=glob.glob(os.path.join(files_path, "*.csv"))
master_df = []
for file in csvfiles:
df = pd.read_csv(file, header = 0)
master_df.append(df)
result = pd.concat(master_df, ignore_index=True)
I know the csv files are not "empty" as I can concatenate them from the command line successfully - however, this method retains the header columns from each csv file into the final "concatenated" dataframe - so this is not really acceptable.
How can I fix this issue?
Upvotes: 0
Views: 1328
Reputation: 20450
This looks fine:
df = pd.read_csv(file, header=0)
But apparently some of your input files are empty.
Adding in a print(file)
debug statement
would help you to focus on particular ones that are empty.
You could Look Before You Leap:
threshold = 2
if os.path.getsize(file) > threshold:
df = pd.read_csv(file, header=0)
master_df.append(df)
Or you could decide that it is Easier To Ask Forgiveness Than Permission:
try:
df = pd.read...
except pandas.errors.EmptyDataError:
print(file, 'was empty. Continuing...')
Upvotes: 2