Michael B. Currie
Michael B. Currie

Reputation: 14668

numpy vs list for non-numerical data

Numpy arrays for numerical data clearly work great, but is it slower to use them for non-numerical data?

For instance, say I have some nested lists of text data:

mammals = ['dog', 'cat', 'rat']
birds = ['stork', 'robin', 'penguin']

animals1 = [mammals, birds]

When accessing and manipulating this data is this list of nested lists going to be faster than the numpy array equivalent?

import numpy as np
animals2 = np.array(animals1)

Since numpy arrays are implemented as "strided" arrays where each element has a fixed length, a "sparse" list of strings with a few long strings will use up a disproportionate amount of memory if converted to a numpy array. But what about speed?

Upvotes: 3

Views: 2867

Answers (1)

Justin O Barber
Justin O Barber

Reputation: 11601

As @JoshAdel has pointed out, you should become familiar with the timeit module. I believe you are asking about this comparison:

>>> import timeit
>>> timeit.timeit('[[x.upper() for x in y] * 10000 for y in animals1]', setup="mammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds]", number=10000)
1.7549941045438686
>>> timeit.timeit("numpy.char.upper(animals2)", setup="import numpy\nmammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds] * 10000\nanimals2=numpy.array(animals1)", number=10000)
221.09816223832195

I updated the test based on your comment. The question is a good one, but you might just need to try some other operations with numpy.char to figure out how it performs. The source file points to a .pyd (dll-type) file with a _vec_string function.

Clearly there is a difference between those two snippets of cod above, with numpy taking over 100 times longer to execute a numpy.char.upper() operation than python takes to execute the .upper() string method.

timeit is very simple to use for small snippets of code like this.

Upvotes: 5

Related Questions