Reputation: 608
I have a long list of tuple that I need to turn into a nested list structure.
The long list of tuples is a list of data structured like this:
[(0.8, A), (-0.4, B), (1.0, C), (0.5, D), (-0.7, E)]
And I have a list of lengths like this:
[2, 2, 1]
My goal is to end with a nested list like this:
[[(0.8, A), (-0.4, B)], [(1.0, C), (0.5, D)], [(-0.7, E)]]
Essentially the length list tells how many tuples from the list of tuples to put in each nested list but I can't figure out how to do this.
Upvotes: 0
Views: 126
Reputation: 21
I tried to compare performance of the two approaches. One with iterators (using next):
import timeit
tuples = [(0.8, "a"), (-0.4, "b"), (1.0, "c"), (0.5, "d"), (-0.7, "e")]
nums = [2, 2, 1]
def collect(tuples, nums):
tuples_iter = iter(tuples)
res = []
for num in nums:
batch = [next(tuples_iter) for i in range(num)]
res.append(batch)
return res
times = 1000000
res = timeit.timeit('collect(tuples, nums)', globals=globals(), number=times)
print(res, res/times)
As a result it shows time:
$ py3 test.py
1.6941756070009433 1.6941756070009433e-06
Another one, with direct access to dict values:
import timeit
tuples = [(0.8, "a"), (-0.4, "b"), (1.0, "c"), (0.5, "d"), (-0.7, "e")]
nums = [2, 2, 1]
def collect(x, lns):
res = []
start = 0
for ln in lns:
res.append(x[start:start+ln])
start += ln
return res
times = 1000000
res = timeit.timeit('collect(tuples, nums)', globals=globals(), number=times)
print(res, res/times)
then I've got the runtime:
$ py3 test2.py
0.6738436879822984 6.738436879822985e-07
We see an apparent difference in the performance even with a small tuple list. Let's generate a long one by replacing initial conditions with:
tuples = [(0.8, "a") for i in range(10000)]
nums = [2 for i in range(5000)]
(it doesn't matter whether it is same tuple or different ones, and I generated same size pattern for simplicity)
also, I reduced number of "runs" to 10000, otherwise it would take a while to wait. so here is the result:
$ py3 test.py
45.54160467500333 0.0045541604675003325
$ py3 test2.py
18.69937555701472 0.001869937555701472
Looks like the solution with the "direct access" is 3 times faster.
Update
Well, to test "pop" method we have to reinstantiate state each time we run the test. For consistency I implemented all 3 approaches within the same framework:
from timeit import default_timer as timer
def prepare():
tuples = [(0.8, "a") for i in range(10000)]
nums = [2 for i in range(5000)]
return tuples, nums
def collect_direct(x, lns):
res = []
start = 0
for ln in lns:
res.append(x[start:start+ln])
start += ln
return res
def collect_iter(tuples, nums):
tuples_iter = iter(tuples)
res = []
for num in nums:
batch = [next(tuples_iter) for i in range(num)]
res.append(batch)
return res
def collect_pop(source, pattern):
res = []
for size in pattern:
# pop(0) takes the first element out of the list
res.append([source.pop(0) for x in range(size)])
return res
def test(func, times):
total = 0
for i in range(times):
tuples, nums = prepare()
start = timer()
func(tuples, nums)
total += timer() - start
return total
times = 1000
print("Times", times)
print("Iter")
res = test(collect_iter, times)
print(res, res/times)
print("Direct")
res = test(collect_direct, times)
print(res, res/times)
print("Pop")
res = test(collect_pop, times)
print(res, res/times)
And here is the output:
$ py3 test4.py
Times 1000
Iter
2.9237239362555556 0.0029237239362555558
Direct
1.5656779609271325 0.0015656779609271325
Pop
22.700828287401237 0.022700828287401238
Because pop not only expected to be similar to "iteration" approach by complexity (we access each element of the initial list), but also has to remove the first element from the list.
As we have learned from the experiments, python lists are not really lists and have a high performance on the direct access. Thus, I would expect popping the elements from such a list might corrupt "index" that has to be rebalanced.
Conclusion: direct access is the winner. Moreover, it will be much greater winner for patterns containing larger chunks. For instance pattern [3,3,3...] I would expect to run 3 times faster on "direct access" in comparison to "iter", and [5, 5, 5, ...] 5 times faster correspondingly (though I didn't check this).
Upvotes: 2
Reputation: 9572
Not so Pythonic but a for
loop will do:
x = [(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
lns = [2, 2, 1]
res = []
start = 0
for ln in lns:
res.append(x[start:start+ln])
start += ln
print(res)
Output:
[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]
Upvotes: 2
Reputation: 19414
Create an iterator over the list and simply pull the indicated amount each time:
l = [(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
sizes = [2, 2, 1]
it = iter(l)
res = [[next(it) for _ in range(size)] for size in sizes]
print(res)
Will give:
[[(0.8, 'A'), (-0.4, 'B')],
[(1.0, 'C'), (0.5, 'D')],
[(-0.7, 'E')]]
Upvotes: 0
Reputation: 2518
I would just a list comprehension:
source =[(0.8, 'A'), (-0.4, 'B'), (1.0, 'C'), (0.5, 'D'), (-0.7, 'E')]
lns = [2,2,1]
res = [[source.pop(0) for _ in range(ln)] for ln in lns]
print(res)
Output:
[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]
Warning: This solution consumes source
, meaning that source
will be an empty list after creating res
. If you do not want that to happen, you have to create a copy of the list first.
Upvotes: 0
Reputation: 18106
Iterate over your 'pattern' and consume your 'long list':
source = [(0.8, "A"), (-0.4, "B"), (1.0, "C"), (0.5, "D"), (-0.7, "E")]
pattern = [2, 2, 1]
res = []
for size in pattern:
# pop(0) takes the first element out of the list
res.append([source.pop(0) for x in range(size)])
print(res)
One-Liner:
res = [[source.pop(0) for x in range(size)] for size in pattern]
Out:
[[(0.8, 'A'), (-0.4, 'B')], [(1.0, 'C'), (0.5, 'D')], [(-0.7, 'E')]]
Upvotes: 1