byoungdale
byoungdale

Reputation: 181

How do I make the filename of the output csv file equal to the the content of a column

I have a huge csv file with all our student rosters inside of it. So, 1) I want to separate the rosters into smaller csv files based on the course name. 2) If I can have the output csv file's name be equal to the course name (example: Algebra1.csv), that would make my life so much better. Is it possible to iterate through the courses_column of the csv file and when the name of the course changes it makes a new csv file for that course. I think I could read the keys of the dictionary 'read_rosters' and then do a while loop?

An example of the csv input file would look like this:

Student firstname, Student lastname, Class Instructor, Course name, primary learning center

johnny, doe, smith, algebra1, online

jane, doe, austin, geometry, campus

Here is what I have so far:

import os
import csv

path = "/PATH/TO/FILE"

with open(os.path.join(path, "student_rosters.csv"), "rU") as rosters:
        read_rosters = csv.DictReader(rosters)
        for row in read_rosters:
            course_name = row['COURSES_COLUMN_HEADER']
            csv_file = os.path.join(course_name, ".csv")
            course_csv = csv.writer(open(csv_file, 'wb').next()

Upvotes: 0

Views: 3758

Answers (2)

Blckknght
Blckknght

Reputation: 104722

In your current code, you're opening an output csv file for each line you read. This will be slow, and, as you've currently written it, it won't work. That's because using the "wb" mode when you open the file erases everything that was in the file before. You might use an "a" mode, but this will still be slow.

How you can best solve the problem depends a bit on your data. If you can rely upon the input always having the rows with the same course next to one another, you could use groupby from the itertools module to easily write the appropriate lines out together:

from itertools import groupby
from operator import itemgetter

with open(os.path.join(path, "student_rosters.csv"), "rb") as rosters:
    reader = csv.DictReader(rosters)
    for course, rows in groupby(reader, itemgetter('COURSES_COLUMN_HEADER')):
        with open(os.path.join(path, course + ".csv"), "wb") as outfile:
            writer = csv.DictWriter(outfile, reader.fieldnames)
            writer.writerows(rows)

If you can't rely upon the organization of the rows, you have a couple options. One would be to read all the rows into a list, then sort them by course and use itertools.groupby like in the code above.

Another option would be to keep reading just one line at a time, with each output row going into an appropriate file. I'd suggest keeping a dictionary of writer objects, indexed by course name. Here's what that could look like:

writers = {}
with open(os.path.join(path, "student_rosters.csv"), "rb") as rosters:
    reader = csv.DictReader(rosters)
    for row in reader:
        course = row['COURSES_COLUMN_HEADER']
        if course not in writers:
            outfile = open(os.path.join(path, course + ".csv"), "wb")
            writers[course] = csv.DictWriter(outfile, reader.fieldnames)
        writers[course].writerow(row)

If you were using this in production, you'd probably want to add some code to close the files after you were done with them, since you can't use with statements to close them automatically.

In my example codes above, I've made the code write out the full rows, just as they were in the input. If you don't want that, you can change the second argument to DictWriter to a sequence of the column names you want to write. You'll also want to include the parameter extrasaction="ignore" so that the extra values in the row dicts will be ignored when the columns you do want are written.

Upvotes: 1

John1024
John1024

Reputation: 113864

First, this is not what you want:

csv_file = os.path.join(course_name, ".csv")

It will create a file named .csv in a subdirectory named course_name. You likely want something like:

csv_file = os.path.join(path, course_name + ".csv")

Also, the following has two issues: (a) unbalanced parens and (b) writer objects don't have a next method:

course_csv = csv.writer(open(csv_file, 'wb').next()

Try instead:

course_csv = csv.writer(open(csv_file, 'wb'))

And, then you need to write something of your choosing to the new file, probably using a writeheader, writerow or writerows method:

course_csv.writeheader(something_of_your_choosing)
course_csv.writerow(something_else_of_your_choosing)

Upvotes: 0

Related Questions