Camilla
Camilla

Reputation: 131

Concatenate multiple csv files from different folders into one csv file in python

I am trying to concatenate multiple csv files into one file(about 30 files). All csv files are located in different folders.

However, I have encountered an error while appending all files together: OSError: Initializing from file failed

Here is my code:

import pandas
import glob
 
path = 'xxx'
target_folders=['Apples', 'Oranges', 'Bananas','Raspberry','Strawberry', 'Blackberry','Gooseberry','Liche']
output ='yyy'
path_list = []
for idx in target_folders:
    lst_of_files = glob.glob(path + idx +'\\*.csv')
    latest_files = max(lst_of_files, key=os.path.getmtime)
    path_list.append(latest_files)
    df_list = [] 
    for file in path_list: 
        df = pd.read_csv(file) 
        df_list.append(df) 
    final_df = df.append(df for df in df_list) 
    combined_csv = pd.concat([pd.read_csv(f) for f in latest_files])

    combined_csv.to_csv(output + "combined_csv.csv", index=False)

    OSError                                   Traceback (most recent call last)
    <ipython-input-126-677d09511b64> in <module>
  1 df_list = []
  2 for file in latest_files:
  ----> 3     df = pd.read_csv(file)
  4     df_list.append(df)
  5 final_df = df.append(df for df in df_list)

    OSError: Initializing from file failed


    

Upvotes: 0

Views: 1404

Answers (3)

Thomas
Thomas

Reputation: 2276

This solution should work as a charm to you:

import pandas as pd
import pathlib

data_dir = '/Users/thomasbryan/projetos/blocklist/files/'
out_dir = '.'

list_files = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    list_files.append(filename)

df = pd.concat(map(pd.read_csv, list_files), ignore_index=True)
df.to_csv(pathlib.Path(out_dir) / 'combined_csv.csv', index=False)

Upvotes: 1

DaveB
DaveB

Reputation: 452

Without seeing your CSV file it's hard to be sure, but I've come across this problem before with unusually formatted CSVs. The CSV parser may be having difficulty in determine the structure of the CSV files, separators etc.

Try df = pd.read_csv(file, engine = 'python')

From the docs: "The C engine is faster while the python engine is currently more feature-complete."

Try passing the engine = 'python' argument on reading a single CSV file and see if you get a successful read. That way you can narrow down the problem to either file reads or traversing the files.

Upvotes: 0

Corralien
Corralien

Reputation: 120409

Try to simplify your code:

import pandas as pd
import pathlib

data_dir = 'xxx'
out_dir = 'yyy'

data = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    df = pd.read_csv(filename)
    data.append(df)

df = pd.concat(df, ignore_index=True)
df.to_csv(pathlib.Path('out_dir') / 'combined_csv.csv', index=False)

Upvotes: 0

Related Questions