KoolKid
KoolKid

Reputation: 59

How to select every Nth row in CSV file using python

I have a CSV file with hundreds of rows, and I would like to select and export every 3 rows to a new CSV file with the new output CSV file being named after the first row of the selection.

For example in the following CSV file....

1980 10 12            
1  2  3  4  5  6  7       
4  6  8  1  0  8  6  
1981 10 12
2  4  9  7  5  4  1  
8  9  3  8  3  7  3

I would like to select the first 3 rows and export to a new CSV named "1980 10 12" based on the first row then select the next 3 rows and export to a new CSV named "1981 10 12" based on the first row of the next 3 rows. I would like to do this using python.

Upvotes: 3

Views: 6654

Answers (3)

user2379410
user2379410

Reputation:

Using slight iterator trickery:

with open('in.csv', 'r') as infh:
    for block in zip(*[infh]*3):
        filename = block[0].strip() + '.csv'
        with open(filename, 'w') as outfh:
            outfh.writelines(block)

On Python 2.X you would use itertools.izip. The docs actually mention izip(*[iter(s)]*n) as an idiom for clustering a data series.

Upvotes: 0

Padraic Cunningham
Padraic Cunningham

Reputation: 180481

import csv
with open("in.csv") as f:
    reader = csv.reader(f)
    chunks = []
    for ind, row in enumerate(reader, 1):
        chunks.append(row)
        if ind % 3 == 0: # if we have three new rows, create a file using the first row as the name
            with open("{}.csv".format(chunks[0][0].strip(), "w") as f1:
                wr = csv.writer(f1) 
                wr.writerows(chunks) # write all rows
            chunks = [] # reset chunks to an empty list

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1123450

Using the csv module, plus itertools.islice() to select 3 rows each time:

import csv
import os.path
from itertools import islice


with open(inputfilename, 'rb') as infh:
    reader = csv.reader(infh)
    for row in reader:
        filename = row[0].replace(' ', '_') + '.csv')
        filename = os.path.join(directory, filename)
        with open(filename, 'wb') as outfh:
            writer = csv.writer(outfh)
            writer.writerow(row)
            writer.writerows(islice(reader, 2))

The writer.writerows(islice(reader, 2)) line takes the next 2 rows from the reader, copying them across to the writer CSV, after writing the current row (with the date) to the output file first.

You may need to adjust the delimiter argument for the csv.reader() and csv.writer() objects; the default is a comma, but you didn't specify the exact format and perhaps you need to set it to a '\t' tab instead.

If you are using Python 3, open the files with 'r' and 'w' text mode, and set newline='' for both; open(inputfilename, 'r', newline='') and open(filename, 'w', newline='').

Upvotes: 3

Related Questions