Reputation: 393
I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.
It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).
I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:
Also, unfortunately Pandas Dataframes are not an option in this case.
Any snippet I can start from?
Thanks
Upvotes: 14
Views: 23758
Reputation: 5177
You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:
import csv
import numpy as np
def getData(filename1, filename2):
with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:
reader1 = csv.reader(csv1)
reader2 = csv.reader(csv2)
for row1, row2 in zip(reader1, reader2):
yield (np.array(row1, dtype=np.float),
np.array(row2, dtype=np.float))
# This will give arrays of floats, for other types change dtype
for tup in getData("file1", "file2"):
print(tup)
Upvotes: 30