Reputation: 526
I'm trying to fill list of arrays from a large csv file (around 250 000 lines), but it's taking ages. I'm sure there is a way to make the process faster, but I don't know how !
Here is the code:
import csv
import numpy as np
energy = []
ondeIG =[]
time =[]
envelope = []
with open('file.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
time = np.hstack([time, row['Time']])
energy = np.hstack([energy, row['Energy']])
ondeIG = np.hstack([ondeIG, row['OndeIG']])
envelope = np.hstack([envelope, row['envelope']])
Thank you !
Upvotes: 1
Views: 87
Reputation: 30288
np.hstack()
constructs a new ndarray each time which is expensive. You can update the list in-place with append:
with open('file.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
time.append(row['Time'])
energy.append(row['Energy'])
ondeIG.append(row['OndeIG'])
envelope.append(row['envelope'])
Upvotes: 2
Reputation: 13218
To import data from csv files, have a look at pandas, and more specially at pandas.read_csv()
Here you are taking tremendous time because you rebuild an array (4 arrays, even) at each iteration.
Upvotes: 0