Dryden Long
Dryden Long

Reputation: 10182

Trouble with Python order of operations/loop

I have some code that is meant to convert CSV files into tab delimited files. My problem is that I cannot figure out how to write the correct values in the correct order. Here is my code:

for file in import_dir:
    data = csv.reader(open(file))
    fields = data.next()
    new_file = export_dir+os.path.basename(file)
    tab_file = open(export_dir+os.path.basename(file), 'a+')
    for row in data:
        items = zip(fields, row)
        item = {}
        for (name, value) in items:
            item[name] = value.strip()
    tab_file.write(item['name']+'\t'+item['order_num']...)
    tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)

Now, since both my write statements are in the for row in data loop, my headers are being written multiple times over.

If I outdent the first write statement, I'll have an obvious formatting error.
If I move the second write statement above the first and then outdent, my data will be out of order.

What can I do to make sure that the first write statement gets written once as a header, and the second gets written for each line in the CSV file? How do I extract the first 'write' statement outside of the loop without breaking the dictionary? Thanks!

Upvotes: 0

Views: 125

Answers (3)

Dryden Long
Dryden Long

Reputation: 10182

Ok, so I figured it out, but it's not the most elegant solutions. Basically, I just ran the first loop, wrote to the file, then ran it a second time and appended the results. See my code below. I would love any input on a better way to accomplish what I've done here. Thanks!

for file in import_dir:
    data = csv.reader(open(file))
    fields = data.next()
    new_file = export_dir+os.path.basename(file)
    tab_file = open(export_dir+os.path.basename(file), 'a+')
    for row in data:
        items = zip(fields, row)
        item = {}
        for (name, value) in items:
            item[name] = value.strip()
    tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.close()

for file in import_dir:
    data = csv.reader(open(file))
    fields = data.next()
    new_file = export_dir+os.path.basename(file)
    tab_file = open(export_dir+os.path.basename(file), 'a+')
    for row in data:
        items = zip(fields, row)
        item = {}
        for (name, value) in items:
            item[name] = value.strip()
        tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
tab_file.close()

Upvotes: 0

Gareth Latty
Gareth Latty

Reputation: 89017

The csv module contains methods for writing as well as reading, making this pretty trivial:

import csv

with open("test.csv") as file, open("test_tab.csv", "w") as out:
    reader = csv.reader(file)
    writer = csv.writer(out, dialect=csv.excel_tab)
    for row in reader:
        writer.writerow(row)

No need to do it all yourself. Note my use of the with statement, which should always be used when working with files in Python.

Edit: Naturally, if you want to select specific values, you can do that easily enough. You appear to be making your own dictionary to select the values - again, the csv module provides DictReader to do that for you:

import csv

with open("test.csv") as file, open("test_tab.csv", "w") as out:
    reader = csv.DictReader(file)
    writer = csv.writer(out, dialect=csv.excel_tab)
    for row in reader:
        writer.writerow([row["name"], row["order_num"], ...])

As kirelagin points out in the commends, csv.writerows() could also be used, here with a generator expression:

writer.writerows([row["name"], row["order_num"], ...] for row in reader)

Upvotes: 7

Óscar López
Óscar López

Reputation: 236034

Extract the code that writes the headers outside the main loop, in such a way that it only gets written exactly once at the beginning.

Also, consider using the CSV module for writing CSV files (not just for reading), don't reinvent the wheel!

Upvotes: 5

Related Questions