Reputation: 110143
Is the following ever done in python to minimize the "allocation time" of creating new objects in a for loop in python? Or, is this considered bad practice / there is a better alternative?
for row in rows:
data_saved_for_row = [] // re-initializes every time (takes a while)
for item in row:
do_something()
do_something
vs. the "c-version" --
data_saved_for_row = []
for row in rows:
for index, item in enumerate(row):
do_something()
data_saved_for_row[index + 1] = '\0' # now we have a crude way of knowing
do_something_with_row() # when it ends without having
# to always reinitialize
Normally the second approach seems like a terrible idea, but I've run into situations when iterating million+ items where the initialization time of the row:
data_saved_for_row = []
has taken a second or more to do.
Here's an example:
>>> print timeit.timeit(stmt="l = list();", number=int(1e8))
7.77035903931
Upvotes: 0
Views: 92
Reputation: 24691
If you want functionality for this sort of performance, you may as well just write it in C yourself and import it with ctypes
or something. But then, if you're writing this kind of performance-driven application, why are you using Python to do it in the first place?
You can use list.clear()
as a middle-ground here, not having to reallocate anything immediately:
data_saved_for_row = []
for row in rows:
data_saved_for_row.clear()
for item in row:
do_something()
do_something
but this isn't a perfect solution, as shown by the cPython source for this (comments omitted):
static int
_list_clear(PyListObject *a)
{
Py_ssize_t i;
PyObject **item = a->ob_item;
if (item != NULL) {
i = Py_SIZE(a);
Py_SIZE(a) = 0;
a->ob_item = NULL;
a->allocated = 0;
while (--i >= 0) {
Py_XDECREF(item[i]);
}
PyMem_FREE(item);
}
return 0;
}
I'm not perfectly fluent in C, but this code looks like it's freeing the memory stored by the list, so that memory will have to be reallocated every time you add something to that list anyway. This strongly implies that the python language just doesn't natively support your approach.
Or you could write your own python data structure (as a subclass of list
, maybe) that implements this paradigm (never actually clearing its own list, but maintaining a continuous notion of its own length), which might be a cleaner solution to your use case than implementing it in C.
Upvotes: 1