Anjat
Anjat

Reputation: 41

Getting a list out of nested list in python

I am getting list out of a nested list.

list_of_data = [{'id':99,
                 'rocketship':{'price':[10, 10, 10, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':898,
                 'rocketship':{'price':[10, 10, 10, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':903,
                 'rocketship':{'price':[20, 20, 20, 10, 10], 
                               'ytd':[1, 1, 1.05, 1.1, 1.18]}},
                {'id':999,
                 'rocketship':{'price':[20, 20, 20, 10, 10], 
                               'ytd':[1, 3, 4.05, 1.1, 1.18]}},
                ]

price, ytd = map(list, zip(*((list_of_data[i]['rocketship']['price'], list_of_data[i]['rocketship']['ytd']) for i in range(0, len(list_of_data)))))

My expected output is below (But, I am getting something different):

price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]

ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

But, I am getting this:
price
Out[19]: 
[[10, 10, 10, 10, 10],
 [10, 10, 10, 10, 10],
 [20, 20, 20, 10, 10],
 [20, 20, 20, 10, 10]]

Expected output:

price = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]

ytd = [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

Upvotes: 3

Views: 214

Answers (6)

shawn caza
shawn caza

Reputation: 374

Instead of passing the list function in your map, you could pass itertools.chain.from_iterable to merge all the individual lists. Then you can run the list() after to transform the generator into a list

import itertools
price_gen, ytd_gen = map(itertools.chain.from_iterable ,zip(*((i['rocketship']['price'], i['rocketship']['ytd']) for i in list_of_data)))

price = list(price_gen)
ytd = list(ytd_gen)

However, creating seperate generators for each dataset actually seems to be much faster. ~7x faster in my test.

import itertools
price_gen = itertools.chain.from_iterable(d['rocketship']['price'] for d in list_of_data)
ytd_gen = itertools.chain.from_iterable(d['rocketship']['ytd'] for d in list_of_data)

price = list(price_gen)
ytd = list(ytd_gen)

Maybe it's the zip that slows things down?

cProfile comparison using the small original dataset looping the task 99,999 times using different solutions presented in this post:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    99999    0.132    0.000    1.344    0.000 (opt_khanh)
    99999    0.469    0.000    0.714    0.000 (opt_shawn)
    99999    0.142    0.000    0.535    0.000 (opt_Jaeyoon)
    99999    0.267    0.000    0.413    0.000 (opt_ramesh)
    99999    0.076    0.000    0.399    0.000 (opt_abdo)

Upvotes: 1

Ramesh
Ramesh

Reputation: 585

try this:

update

Thanks @shawn caza performance test for 100000 loops:

shawncaza answer: 0.10945558547973633 seconds

my answer with get method : 0.1443953514099121 seconds

my answer with square bracket method : 0.10936307907104492 seconds

list_of_data = [{'id': 99,
             'rocketship': {'price': [10, 10, 10, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 898,
             'rocketship': {'price': [10, 10, 10, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 903,
             'rocketship': {'price': [20, 20, 20, 10, 10],
                            'ytd': [1, 1, 1.05, 1.1, 1.18]}},
            {'id': 999,
             'rocketship': {'price': [20, 20, 20, 10, 10],
                            'ytd': [1, 3, 4.05, 1.1, 1.18]}},
            ]
price = []
ytd = []
for i in list_of_data:
    price.extend(i['rocketship']['price'])
    ytd.extend(i['rocketship']['ytd'])
print(price)
print(ytd)

>>> [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10]
>>> [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

Upvotes: 4

Khanh Luong
Khanh Luong

Reputation: 542

I traded a bit of readability for performance here

import itertools

tuples = ((item['rocketship']['price'], item['rocketship']['ytd']) for item in list_of_data)
price, ytd = functools.reduce(lambda a, b: (a[0] + b[0], a[1] + b[1]), tuples, ([], []))

I tried to keep things in a single loop and use generator to optimize memory use. But if the data is big, the resulting price and ytd are also big too, hopefully you thought about that already.

Update:

Thanks to @j1-lee's performance test, I redo the code again as follow:

import functools


def extend_list(a, b):
    a.extend(b)
    return a


tuples = ((item['rocketship']['price'], item['rocketship']['ytd'])
          for item in list_of_data)
price, ytd = map(
    list,
    functools.reduce(
        lambda a, b: (extend_list(a[0], b[0]), extend_list(a[1], b[1])),
        tuples,
        ([], [])
    )
)

This reduce the execution time from 45.556s to 0.096s. My best guess would be when you use + operator, it would create a new list from 2 old list, which requires copying them over a new one, so it will go as:

list(4) + list(4) = list(8)  # 8 copies
list(8) + list(4) = list(12)  # 12 copies
list(12) + list(4) = list(16)  # 16 copies
...

Using .extend() would only need to copy the new additional list into the old one, so it should be faster

list(4).extend(list(4)) = list(8)  # 4 copies
list(8).extend(list(4)) = list(12)  # 4 copies
list(12).extend(list(4)) = list(16)  # 4 copies
...

It would be better if someone can point to the specific documentation or information though.

Upvotes: 2

Jaeyoon Jeong
Jaeyoon Jeong

Reputation: 699

I try to use a double comprehension. I don't know it's a good idea as it could hurt code readibility, maybe.

price = [
    item
    for sublist in [rocket["rocketship"]["price"] for rocket in list_of_data]
    for item in sublist
]

ytd = [
    item
    for sublist in [rocket["rocketship"]["ytd"] for rocket in list_of_data]
    for item in sublist
]

print(price)
print(ytd)

Upvotes: 0

Abdo Sabry
Abdo Sabry

Reputation: 94

Using list comprehension:

price, ytd = [i for item in list_of_data for i in item["rocketship"]["price"]],
             [i for item in list_of_data for i in item["rocketship"]["ytd"]]

Output

price: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 20, 20, 20, 10, 10, 20, 20, 20, 10, 10] 

ytd: [1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 1, 1.05, 1.1, 1.18, 1, 3, 4.05, 1.1, 1.18]

Upvotes: 2

Ankit Sharma
Ankit Sharma

Reputation: 1654

Perform a list comprehension and flatten your result.

ytd = sum([d['rocketship']['ytd'] for d in list_of_data], [])
price = sum([d['rocketship']['price'] for d in list_of_data], [])

Upvotes: 1

Related Questions