Reputation: 1298
Sample code
import numpy as np
import time
class A:
def __init__(self, n):
self.n = n
def str_n(self):
return str(self.n)
idx = np.asarray(list(range(30000)))
l_a = []
for i in range(400000):
l_a.append(A(i))
l_a_arr = np.asarray(l_a)
l_a_str_arr = np.asarray([i.str_n() for i in l_a])
s_time = time.time()
l_a_idx_str_arr = l_a_str_arr[idx].tolist()
cost_time = time.time() - s_time
print("String array cost time is ", cost_time)
s_time = time.time()
l_a_idx_arr = l_a_arr[idx].tolist()
cost_time = time.time() - s_time
print("Class array cost time is ", cost_time)
The logs:
String array cost time is 0.0014674663543701172
Class array cost time is 0.0003917217254638672
UPDATE
repeat 1000 time and remove tolist()
import numpy as np
import time
class A:
def __init__(self, n):
self.inner_n = n + 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
def str_n(self):
return str(self.inner_n)
idx = np.asarray(list(range(30000)))
l_a = []
for i in range(400000):
l_a.append(A(i))
l_a_arr = np.asarray(l_a)
l_a_str_arr = np.asarray([i.str_n() for i in l_a])
avg_time = []
for i in range(1000):
s_time = time.time()
l_a_idx_str_arr = l_a_str_arr[idx].tolist()
cost_time = time.time() - s_time
avg_time.append(cost_time)
print("String array cost time with tolist is ", np.average(avg_time))
avg_time1 = []
for i in range(1000):
s_time = time.time()
l_a_idx_arr = l_a_arr[idx].tolist()
cost_time = time.time() - s_time
avg_time1.append(cost_time)
print("Class array cost time with tolist is ", np.average(avg_time1))
avg_time2 = []
for i in range(1000):
s_time = time.time()
l_a_idx_str_arr = l_a_str_arr[idx]
cost_time = time.time() - s_time
avg_time2.append(cost_time)
print("String array cost time is ", np.average(avg_time2))
avg_time3 = []
for i in range(1000):
s_time = time.time()
l_a_idx_arr = l_a_arr[idx]
cost_time = time.time() - s_time
avg_time3.append(cost_time)
print("Class array cost time is ", np.average(avg_time3))
The logs:
String array 1000 average cost time with tolist is 0.0037294850349426267
Class array 1000 average cost time with tolist is 0.00030662870407104493
String array 1000 average cost time is 0.0014972503185272216
Class array 1000 average cost time is 0.0001489844322204589
The array of strings is a part of array of object, why its indexing spent more time?
Upvotes: 0
Views: 152
Reputation: 231385
Object dtype arrays are like lists, storing references to objects. Indexing is nearly as fast as with lists.
String dtype arrays store strings as bytes, just as they do with numbers. Indexing individual elements is slower since it requires a conversion from the numpy bytes to python strings ('unboxing').
Arrays are best used 'whole' rather than iteratively.
Upvotes: 1