Python- Arranging different rows of CSV file by constant column header

Question

I have a CSV file which auto updates some data in following order

A,B,C,D,E,F
4,2,6,4,8,9
D,C,A,B,E,F
6,4,5,8,6,2
E,F,A,C,D
4,2,7,6,5

As you would notice, the header values appear in different order in different rows. At times one of the header column values is missing as well.

Requirement is to sort it with consistent header and all values below that. For e.g.

A,B,C,D,E,F
4,2,6,4,8,9
A,B,C,D,E,F
5,8,4,6,6,2
A,B,C,D,E,F
7, ,6,5,4,2

OR

    A,B,C,D,E,F
    4,2,6,4,8,9
    5,8,4,6,6,2
    7, ,6,5,4,2

I tried sorting it with following code, however it only sorts the first row and later on it prints as it is.

with open('mycsv.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
    fieldnames = ['A','B','C','D','E','F','G']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)

Any pointers on how to achieve this would help. Thanks.

Patrick Artner · Accepted Answer

You can import your file, and continue to read 2 lines (header + data) and create a dict for them. You add the dict to a list containing all of your data. You get the largest dict (the one that contains the most keys), sort it and write all the data back.

In dicts that miss a key, you can subtitute its value by an empty string:

Create data file:

with open("t.csv","w") as f:
    f.write("""A,B,C,D,E,F
4,2,6,4,8,9
D,C,A,B,E,F
6,4,5,8,6,2
E,F,A,C,D
4,2,7,6,5""")

Then:

# read in data as list of dicts, each dict contains 2 rows worth of data    
data = []
with open("t.csv") as f:
    while True:
        try:
            # get a header line and a data line
            header = next(f).strip().split(",")
            d = next(f).strip().split(",")
            # create a dict from it and append it to your data collection
            data.append( {k:v for k,v in zip(header,d)} )

        except StopIteration:
            print("done")
            break

# get a sorted set of all keys in all dicts:
keys = set()
for k in data:
    keys.update(k)
keys = sorted(keys)

# write the data again
with open("new_t.csv","w") as f:
    # write headers once
    f.write(",".join(keys))
    f.write("
")
    for d in data:
        f.write(",".join( ( d.get(k,"") for k in keys  )))
        f.write("
")

# check:
with open("new_t.csv","r") as f:
    print(f.read())

Resulting file:

A,B,C,D,E,F
4,2,6,4,8,9
5,8,4,6,6,2
7,,6,5,4,2

I use python3 style printing - but the code works the same in python 2.7 and 3.x.

Make sure to check that your source file contains header+data rows and no empty ones, else you have to adjust the code to omit empty lines.

Python- Arranging different rows of CSV file by constant column header

Answers (2)

Related Questions