Reputation: 863
I have a dictionary id_to_phone. It contains around 350,000 unique id(dictionary key) each represent unique phone number(dictionary value).My requirement is to get phone number for the ids generated by my code.From my code around 10,000 to 50,000 ids are generated & from that ID I need find matching phone number & store it in an array. I used following code
count=phone_id.shape[0]
phone_array=np.array([])
for i in range(count):
phone=id_to_phone[phone_id[i]]
phone_array=np.append(phone_array,phone)
But this code is taking very long time.Is there any way to optimize my code?
Upvotes: 2
Views: 50
Reputation: 13088
You're problem is not the dictionary lookup but the np.append
. NumPy arrays are fixed sized contiguous blocks of memory, appending to them, beyond the current size, requires the entire memory block to be re-sized and moved (copied somewhere else), that takes a long time, if you're doing just a few appends it doesn't matter but doing lots of appends is likely to increase the size of the array beyond what was originally allocated. (correction) From the docs:
Return: A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled. If axis is None, out is a flattened array.
So every call to np.append
copies the array, no wonder it takes a long time.
Use a regular python list instead, appends are constant time to a list.
import timeit
import numpy as np
def np_append():
arr = np.asarray([])
for i in range(5000):
np.append(arr, i)
def list_append():
ls = []
for i in range(5000):
ls.append(i)
if __name__ == "__main__":
print(timeit.repeat('np_append()', number=10, repeat=3, globals=globals()))
print(timeit.repeat('list_append()', number=10, repeat=3, globals=globals()))
the timings are as follows
np_append : [0.15639284392818809, 0.15938732610084116, 0.15667122812010348]
list_append : [0.003160736057907343, 0.004024225985631347, 0.003376785898581147]
Alternatively if you know the number of elements that are going to be added to the list you can preallocate that much space for the numpy.array
using phone_array = np.zeros((15000, 10))
, for instance, for 15000 10-digit phone numbers.
Upvotes: 4