How to combine horizontally many CSV files using python csv or pandas module?

Question

Hello! I would like to combine horizontally many CSV files (the total number will oscillate around 120-150) into one CSV file by adding one column from each file (in this case column called “grid”). All those files have the same columns and number of the rows (they are constructed the same) and are stored in the same catalogue. I’ve tried with CSV module and pandas. I don't want to define all 120 files. I need a script to do it automatically. I’m stuck and I have no ideas...

Some input CSV files (data) and CSV file (merged) which I would like to get: https://www.dropbox.com/transfer/AAAAAHClI5b6TPzcmW2dmuUBaX9zoSKYD1ZrFV87cFQIn3PARD9oiXQ

That's how my code looks like when I use the CSV module:

import os
import glob
import csv

os.chdir('\csv_files_direction')

extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')

with open(out_merg,'wt') as out:
    writer = csv.writer(out)
    for file in files:
        with open(file) as csvfile:
            data = csv.reader(csvfile, delimiter=';')
            result = []
            for row in data:
                a = row[3] #column which I need
                result.append(a)

Using this code I receive values only from the last CSV. The rest is missing. As a result I would like to have one precise column from each CSV file from the catalogue.

And Pandas:

import os
import glob
import pandas as pd
import csv

os.chdir('\csv_files_direction')

extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')
in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files]

Using pandas I receive data from all CSV's as the list which can be navigated using e.g in_names[1]. I confess that this is my first try with pandas and I don't have ideas what should be my next step.

I will really appreciate any help! Thanks in advance, Mateusz

ragioniere96 · Accepted Answer

For the part of CSV i think you need another list define OUTSIDE the loop. Something like

import os
import sys
dirname = os.path.dirname(os.path.realpath('__file__'))
import glob
import csv


extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('merged_csv_file_direction')

result= []
with open(out_merg,'wt') as out:
    writer = csv.writer(out)
    for file in files:
        with open(file) as csvfile:
            data = csv.reader(csvfile, delimiter=';')
            col = []
            for row in data:
                a = row[3] #column which I need
                col.append(a)
            result.append((col))

NOTE: I have also changed the way to go into the folder. Now you can run the file direcly in the folder that contains the 2 folders (one for take the data and the other to save the data)

Regarding the part of pandas you can create a loop again. This time you need to CONCAT the dataframes that you have created using in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files] I think you can use

import os
import glob
import pandas as pd
import csv

os.chdir('\csv_files_direction')

extension = 'csv'
files = [i for i in glob.glob('*.{}'.format(extension))]
out_merg = ('\merged_csv_file_direction')
in_names = [pd.read_csv(f, delimiter=';', usecols = ['grid']) for f in files]
result = pd.concat(in_names)

Tell me if it works

How to combine horizontally many CSV files using python csv or pandas module?

Answers (1)

Related Questions