Reputation: 14668
Numpy arrays for numerical data clearly work great, but is it slower to use them for non-numerical data?
For instance, say I have some nested lists of text data:
mammals = ['dog', 'cat', 'rat']
birds = ['stork', 'robin', 'penguin']
animals1 = [mammals, birds]
When accessing and manipulating this data is this list of nested lists going to be faster than the numpy array equivalent?
import numpy as np
animals2 = np.array(animals1)
Since numpy arrays are implemented as "strided" arrays where each element has a fixed length, a "sparse" list of strings with a few long strings will use up a disproportionate amount of memory if converted to a numpy array. But what about speed?
Upvotes: 3
Views: 2867
Reputation: 11601
As @JoshAdel has pointed out, you should become familiar with the timeit
module. I believe you are asking about this comparison:
>>> import timeit
>>> timeit.timeit('[[x.upper() for x in y] * 10000 for y in animals1]', setup="mammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds]", number=10000)
1.7549941045438686
>>> timeit.timeit("numpy.char.upper(animals2)", setup="import numpy\nmammals = ['dog', 'cat', 'rat']\nbirds = ['stork', 'robin', 'penguin']\nanimals1 = [mammals, birds] * 10000\nanimals2=numpy.array(animals1)", number=10000)
221.09816223832195
I updated the test based on your comment. The question is a good one, but you might just need to try some other operations with numpy.char to figure out how it performs. The source file points to a .pyd
(dll-type) file with a _vec_string
function.
Clearly there is a difference between those two snippets of cod above, with numpy taking over 100 times longer to execute a numpy.char.upper()
operation than python takes to execute the .upper()
string method.
timeit
is very simple to use for small snippets of code like this.
Upvotes: 5