Ivan
Ivan

Reputation: 20101

Container list in Python: standard list vs numpy array

I'm writing some python code that needs to store and access a list of different kinds of elements. Each element of this list will be of a different class type. For example:

def file_len(fname):
    i = 0
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

element_list = [ ]
data = np.loadtxt(filename)


if file_len(filename) == 1 : 
            data = np.loadtxt(filename)
            param1 = data[0]
            param2 = data[1]
            element_list.append(Class1.Class1(param1,param2))
else:
    for field in data:
                    param1 = field[0]
                    param2 = field[1]
                    element_list.append(Class1.Class1(param1, param2)

Later I will need to access the methods from Class1 inserted on element_list several times, but the list will not needed to be modified:

for i in xrange(10000):
    for element in element_list:
        result += element.calculate_result(i).real #the results will be complex

Is there an efficient way to do this?

Thanks!

Upvotes: 2

Views: 948

Answers (2)

steveha
steveha

Reputation: 76775

This is not a full answer, but I spotted two things I could contribute.

Here is an improved version of file_len(). This one will return 0 if the file is zero-length. Your function returns 1 for a zero-length file, and 1 for a file with one line.

def file_len(fname):
    i = 0
    with open(fname) as f:
        for i, l in enumerate(f, 1):
            pass
    return i

Here is a faster way to do the compute loop.

result = sum(e.calculate_result(i).real for i in xrange(10000) for e in element_list)

It might be possible to make it even faster using reduce(), but I don't think it can be much faster. The big savings with reduce() is if you can avoid binding names over and over, but we need to bind the name e so we can call e.calculate_result(i).real even when the type of e could be anything.

If you could do something like this it might be a bit faster.

import itertools as it
import operator as op
result = reduce(op.add, it.imap(SomeClass.calculate_something, it.product(element_list, xrange(10000))))

Again, the main savings is to avoid binding names. it.product() returns tuples that include (e, i) where e is an element from element_list and i is a number from xrange(10000). Then it.imap() will call the function and pass the tuple as an argument. Then reduce() will sum everything. Actually, just calling sum() is probably as good as reduce(op.add) but you could try it both ways and see if either one is slightly faster than the other. If you can figure out something sensible for SomeClass.calculate_something then maybe you can make this work.

Hmm, it might be worth trying just letting sum() (or reduce()) compute a complex sum, and then throw away the imaginary part when the sum is done. Would that be faster than accessing the .real attribute once per value? I'm not sure, but it might help you make the reduce() version work.

EDIT:

You should try running your program under PyPy.

http://pypy.org/

If you do that, be sure to use this line instead of the first one I showed:

result = sum(e.calculate_result(i).real for e in element_list for i in xrange(10000))

This way, you are using each element e for 10000 calls in a row, which should help the PyPy just-in-time compiler ("JIT") to produce better code. I don't know if the JIT will help with only 10000 calls or not, but it seems like that should be the way to try it.

Upvotes: 1

Henry Gomersall
Henry Gomersall

Reputation: 8712

You could put the results inside the class into a view into the array by passing that view at instantiation. If you're accessing the data more often than you're calling class methods to update it, this should work.

Something like the following...

def file_len(fname):
    i = 0
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

element_list = [ ]
data = np.loadtxt(filename)


array_idx = 0

# length_of_data is the number of elements that will be in element_list
result_array = numpy.zeros(length_of_data, dtype='complex128')

if file_len(filename) == 1 : 
    data = np.loadtxt(filename)
    param1 = data[0]
    param2 = data[1]
    element_list.append(Class1.Class1(param1, param2, 
                            result_array[array_idx:array_idx+1]))
    array_idx += 1
else:
    for field in data:
        param1 = field[0]
        param2 = field[1]
        element_list.append(Class1.Class1(param1, param2,
                            result_array[array_idx:array_idx+1])
        array_idx += 1

Inside the class you'd then update the view directly. Consider this minimal example:

import numpy

a = numpy.zeros(5, dtype='complex128')

class Foo(object):

    def __init__(self, real, imag, array_view):
        self._array_view = array_view
        self._array_view[:] = real + 1j*imag #<--- The [:] is needed


element_list = []
for n in range(0, len(a)):
    element_list.append(Foo(n, n+1, a[n:n+1]))

print(a)
print(numpy.sum(a))

Upvotes: 0

Related Questions