member555
member555

Reputation: 807

Fastest way to create a numpy array from text file

I have 60mb file with lots of lines.

Each line has the following format:

(x,y)

Each line will be parsed as a numpy vector at shape (1,2).

At the end it should be concatenated into a big numpy array at shpae (N,2) where N is the number of lines.

What is the fastest way to do that? Because now it takes too much time(more than 30 min).

My Code:

with open(fname) as f:
for line in f:
    point = parse_vector_string_to_array(line)
    if points is None:
        points = point
    else:
        points = np.vstack((points, point))

Where the parser is:

def parse_vector_string_to_array(string):
    x, y =eval(string)
    array = np.array([[x, y]])
    return array

Upvotes: 3

Views: 4078

Answers (1)

hpaulj
hpaulj

Reputation: 231335

One thing that would improve speed is to imitate genfromtxt and accumulate each line in a list of lists (or tuples). Then do one np.array at the end.

for example (roughly):

points = []
for line in file:
    x,y = eval(line)
    points.append((x,y))
result = np.array(points)

Since your file lines look like tuples I'll leave your eval parsing. We don't usually recommend eval, but in this limited case it might the simplest.

You could try to make genfromtxt read this, but the () on each line will give some headaches.

pandas is supposed to have a faster csv reader, but I don't know if it can be configured to handle this format or now.

Upvotes: 2

Related Questions