Vegard Haugstvedt
Vegard Haugstvedt

Reputation: 11

Python: merge csv data with differing headers

I have a bunch of software output files that I have manipulated into csv-like text files. I have probably done this the hard way, because I am not too familiar with python library

The next step is to gather all this data in one single csv file. The files have different headers, or are sorted differently.

Lets say this is file A:

A | B | C | D | id 
0   2   3   2   "A"
...

and this is file B:

B | A | Z | D | id
4   6   1   0   "B"
...

I want the append.csv file to look like:

A | B | C | D | Z | id
0   2   3   2       "A"
6   4       0   1   "B"
...

How can I do this, elegantly? Thank you for all answers.

Upvotes: 1

Views: 160

Answers (2)

snakecharmerb
snakecharmerb

Reputation: 55599

The csv module in the standard library provides tools you can use to do this. The DictReader class produces a mapping of column name to value for each row in a csv file; the DictWriter class will write such mappings to a csv file.

DictWriter must be provided with a list of column names, but does not require all column names to be present in each row mapping.

import csv

list_of_files = ['1.csv', '2.csv']

# Collect the column names.
all_headers = set()
for file_ in list_of_files:
    with open(file_, newline='') as f:
        reader = csv.reader(f)
        headers = next(reader)
        all_headers.update(headers)
all_headers = sorted(all_headers)

# Generate the output file.
with open('append.csv', 'w', newline='') as outfile:
    writer = csv.DictWriter(outfile, fieldnames=all_headers)
    writer.writeheader()
    for file_ in list_of_files:
        with open(file_, newline='') as f:
            reader = csv.DictReader(f)
            writer.writerows(reader)
$ cat append.csv
A,B,C,D,Z,id
0,2,3,2,,A
6,4,,0,1,B

Upvotes: 0

fabioconcina
fabioconcina

Reputation: 459

You can use pandas to read CSV files into DataFrames and use the concat method, then write the result to CSV:

import pandas as pd

df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")

df = pd.concat([df1, df2], axis=0, ignore_index=True)

df.to_csv("file.csv", index=False)

Upvotes: 1

Related Questions