ModalBro
ModalBro

Reputation: 554

Merging all csvs in a folder and adding a new column with filename of original file in Python

I am trying to merge all the csv files in a folder into one large csv file. I also need to add a new column to this merged csv that shows the original file that each row came from. This is the code I have so far:

import csv
import glob


read_files = glob.glob("*.csv")

source = []

with open("combined.files.csv", "wb") as outfile:
    for f in read_files:
        source.append(f)
        with open(f, "rb") as infile:
            outfile.write(infile.read())

I know I have to somehow repeat each f for as many rows as are in each csv and then append that as a new column to the .write command, but I am not sure how to do this. Thank you everyone!

Upvotes: 0

Views: 3345

Answers (1)

tdelaney
tdelaney

Reputation: 77337

If you add the filename as the final column, you don't need to parse the csv at all. Just read them line by line, add filename and write. And don't open in binary mode!

import glob
import os

out_filename = "combined.files.csv"
if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

If your csv's have a common header line, pick one to write to the outfile and supress the rest

import os
import glob

want_header = True
out_filename = "combined.files.csv"

if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")

with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            if want_header:
                outfile.write('{},Filename\n'.format(next(infile).strip()))
                want_header = False
            else:
                next(infile)
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

Upvotes: 5

Related Questions