d.grassi84
d.grassi84

Reputation: 393

Python generator to read large CSV file

I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.

It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).

I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:

Also, unfortunately Pandas Dataframes are not an option in this case.

Any snippet I can start from?

Thanks

Upvotes: 14

Views: 23758

Answers (1)

jotasi
jotasi

Reputation: 5177

You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:

import csv
import numpy as np

def getData(filename1, filename2):
    with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
        reader1 = csv.reader(csv1)
        reader2 = csv.reader(csv2)
        for row1, row2 in zip(reader1, reader2):
            yield (np.array(row1, dtype=np.float),
                   np.array(row2, dtype=np.float)) 
                # This will give arrays of floats, for other types change dtype

for tup in getData("file1", "file2"):
    print(tup)

Upvotes: 30

Related Questions