Reputation: 41
I am getting list out of a nested list.
list_of_data = [{'id':99,
'rocketship':{'price':[10, 10, 10, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':898,
'rocketship':{'price':[10, 10, 10, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':903,
'rocketship':{'price':[20, 20, 20, 10, 10],
'ytd':[1, 1, 1.05, 1.1, 1.18]}},
{'id':999,
'rocketship':{'price':[20, 20, 20, 10, 10],
'ytd':[1, 3, 4.05, 1.1, 1.18]}},
]
price, ytd = map(list, zip(*((list_of_data[i]['rocketship']['price'], list_of_data[i]['rocketship']['ytd']) for i in range(0, len(list_of_data)))))
My expected output is below (But, I am getting something different):
price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
But, I am getting this:
price
Out[19]:
[[10, 10, 10, 10, 10],
[10, 10, 10, 10, 10],
[20, 20, 20, 10, 10],
[20, 20, 20, 10, 10]]
Expected output:
price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
Upvotes: 3
Views: 214
Reputation: 374
Instead of passing the list function in your map, you could pass itertools.chain.from_iterable to merge all the individual lists. Then you can run the list() after to transform the generator into a list
import itertools
price_gen, ytd_gen = map(itertools.chain.from_iterable ,zip(*((i['rocketship']['price'], i['rocketship']['ytd']) for i in list_of_data)))
price = list(price_gen)
ytd = list(ytd_gen)
However, creating seperate generators for each dataset actually seems to be much faster. ~7x faster in my test.
import itertools
price_gen = itertools.chain.from_iterable(d['rocketship']['price'] for d in list_of_data)
ytd_gen = itertools.chain.from_iterable(d['rocketship']['ytd'] for d in list_of_data)
price = list(price_gen)
ytd = list(ytd_gen)
Maybe it's the zip that slows things down?
cProfile comparison using the small original dataset looping the task 99,999 times using different solutions presented in this post:
ncalls tottime percall cumtime percall filename:lineno(function)
99999 0.132 0.000 1.344 0.000 (opt_khanh)
99999 0.469 0.000 0.714 0.000 (opt_shawn)
99999 0.142 0.000 0.535 0.000 (opt_Jaeyoon)
99999 0.267 0.000 0.413 0.000 (opt_ramesh)
99999 0.076 0.000 0.399 0.000 (opt_abdo)
Upvotes: 1
Reputation: 585
try this:
update
Thanks @shawn caza
performance test for 100000 loops:
shawncaza answer: 0.10945558547973633 seconds
my answer with get method : 0.1443953514099121 seconds
my answer with square bracket method : 0.10936307907104492 seconds
list_of_data = [{'id': 99,
'rocketship': {'price': [10, 10, 10, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 898,
'rocketship': {'price': [10, 10, 10, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 903,
'rocketship': {'price': [20, 20, 20, 10, 10],
'ytd': [1, 1, 1.05, 1.1, 1.18]}},
{'id': 999,
'rocketship': {'price': [20, 20, 20, 10, 10],
'ytd': [1, 3, 4.05, 1.1, 1.18]}},
]
price = []
ytd = []
for i in list_of_data:
price.extend(i['rocketship']['price'])
ytd.extend(i['rocketship']['ytd'])
print(price)
print(ytd)
>>> [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
>>> [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
Upvotes: 4
Reputation: 542
I traded a bit of readability for performance here
import itertools
tuples = ((item['rocketship']['price'], item['rocketship']['ytd']) for item in list_of_data)
price, ytd = functools.reduce(lambda a, b: (a[0] + b[0], a[1] + b[1]), tuples, ([], []))
I tried to keep things in a single loop and use generator to optimize memory use. But if the data is big, the resulting price
and ytd
are also big too, hopefully you thought about that already.
Update:
Thanks to @j1-lee's performance test, I redo the code again as follow:
import functools
def extend_list(a, b):
a.extend(b)
return a
tuples = ((item['rocketship']['price'], item['rocketship']['ytd'])
for item in list_of_data)
price, ytd = map(
list,
functools.reduce(
lambda a, b: (extend_list(a[0], b[0]), extend_list(a[1], b[1])),
tuples,
([], [])
)
)
This reduce the execution time from 45.556s
to 0.096s
. My best guess would be when you use +
operator, it would create a new list from 2 old list, which requires copying them over a new one, so it will go as:
list(4) + list(4) = list(8) # 8 copies
list(8) + list(4) = list(12) # 12 copies
list(12) + list(4) = list(16) # 16 copies
...
Using .extend()
would only need to copy the new additional list into the old one, so it should be faster
list(4).extend(list(4)) = list(8) # 4 copies
list(8).extend(list(4)) = list(12) # 4 copies
list(12).extend(list(4)) = list(16) # 4 copies
...
It would be better if someone can point to the specific documentation or information though.
Upvotes: 2
Reputation: 699
I try to use a double comprehension. I don't know it's a good idea as it could hurt code readibility, maybe.
price = [
item
for sublist in [rocket["rocketship"]["price"] for rocket in list_of_data]
for item in sublist
]
ytd = [
item
for sublist in [rocket["rocketship"]["ytd"] for rocket in list_of_data]
for item in sublist
]
print(price)
print(ytd)
Upvotes: 0
Reputation: 94
Using list comprehension:
price, ytd = [i for item in list_of_data for i in item["rocketship"]["price"]],
[i for item in list_of_data for i in item["rocketship"]["ytd"]]
Output
price: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
ytd: [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]
Upvotes: 2
Reputation: 1654
Perform a list comprehension and flatten your result.
ytd = sum([d['rocketship']['ytd'] for d in list_of_data], [])
price = sum([d['rocketship']['price'] for d in list_of_data], [])
Upvotes: 1