Reputation: 21
I have a number of lists that correspond to each other like this:
ID_number = [1, 2, 3, 4, 5, 6, ...]
x_pos = [43.2, 53.21, 34.2, ...]
y_pos = [32.1, 42.1, 8.2, ...]
z_pos = [1.3, 67.1, 24.3, ...]
etc.
I want to be able to sort, pull, and perform operations on the data based on the ID_number, so I want to make a dictionary from these lists like this,
dictionary = {'id1':[x_pos1, y_pos1, z_pos1], 'id2':[x_pos2, y_pos2, z_pos2], ...}
where the key is the ID number and the value is a list containing the corresponding data for that ID number. How would I go about doing this efficiently in python?
Upvotes: 2
Views: 151
Reputation: 101989
Use zip
twice:
>>> ids = [1,2,3,4]
>>> x_pos = [1.32, 2.34, 5.56, 8.79]
>>> y_pos = [1.2, 2.3, 3.4, 4.5]
>>> z_pos = [3.33, 2.22, 10.98, 10.1]
>>> dict(zip(ids, zip(x_pos, y_pos, z_pos)))
{1: (1.32, 1.2, 3.33), 2: (2.34, 2.3, 2.22), 3: (5.56, 3.4, 10.98), 4: (8.79, 4.5, 10.1)}
Timing comparison with the genexp:
>>> import timeit
>>> timeit.timeit('dict(zip(ids, zip(x_pos, y_pos, z_pos)))', 'from __main__ import ids, x_pos, y_pos, z_pos')
1.6184730529785156
>>> timeit.timeit('dict((x[0], x[1:]) for x in zip(ids, x_pos, y_pos, z_pos))', 'from __main__ import ids, x_pos, y_pos, z_pos')
2.5186140537261963
So, using zip
twice is about 1.5x times faster than using the generator expression. Obviously the results depend on the size of the iterables but I'm quite confident on the fact that using double zip
, at least on CPython 2 will always be faster than explicit loops. Generator exceptions or for
loops require much more work for the interpreter than the single call to zip
, which removes some overhead from the iteration process.
Using itertools.izip
instead of zip
doesn't change much the timings but is a lot more memory efficient for big data sets.
Upvotes: 4
Reputation: 15463
dictionary = {'id' + str(i): [x, y, z]
for i, x, y, z in zip(ID_number, x_pos, y_pos, z_pos)}
for large data-sets probably faster with itertools
' izip()
.
Upvotes: 0
Reputation: 169143
zip()
is quite useful to accomplish this. For example:
>>> ID_number = [1,2,3]
>>> x_pos = [43.2, 53.21, 34.2]
>>> y_pos = [32.1, 42.1, 8.2]
>>> z_pos = [1.3, 67.1, 24.3]
>>> dict((x[0], x[1:]) for x in zip(ID_number, x_pos, y_pos, z_pos))
{1: (43.200000000000003, 32.100000000000001, 1.3), 2: (53.210000000000001, 42.100000000000001, 67.099999999999994), 3: (34.200000000000003, 8.1999999999999993, 24.300000000000001)}
If the data set is quite large, you can avoid zip()
's creation of an entirely new copy of the whole data set by using itertools.izip()
instead. This function will return an iterator that will provide each zipped element when requested instead of holding the whole new structure in memory. (The result will be the same, but it should be faster on larger data sets.)
>>> import itertools
>>> dict((x[0], x[1:]) for x in itertools.izip(ID_number, x_pos, y_pos, z_pos))
{1: (43.200000000000003, 32.100000000000001, 1.3), 2: (53.210000000000001, 42.100000000000001, 67.099999999999994), 3: (34.200000000000003, 8.1999999999999993, 24.300000000000001)}
Upvotes: 2