dogs22
dogs22

Reputation: 1

How do I extract from a list in python?

If I have a list that is made up of 1MM ids, how would I pull from that list in intervals of 50k?

For example:

[1]cusid=df['customer_id'].unique().tolist()
[1]1,000,500

If I want to pull in chunks, is the below correct for 50k?

cusid=cusid[:50000] - first 50k ids
cusid=cusid[50000:100001] - the next 50k of ids
cusid=cusid[100001:150001] - the next 50k 

are my interval selections correct?

Thanks!

Upvotes: 0

Views: 77

Answers (2)

Slam
Slam

Reputation: 8572

Couple of things to mention:

  1. It seems that you're using "data science" stack for your work, good chance you have numpy available, please take a look at numpy.array_split. You can calculate chunk amount once and use np view machinery. Most probably this is a lot faster than bringing np arrays in to native python lists

  2. Idiomatic python approach (IMO) would be leveraging iterators + islice:

    from itertools import islice
    # create iterator from your array/list, this is cheap operation
    iterator = iter(cusid)
    
    # if you want element-wise operations, you can use your chunk in loops or function that require iterations
    # this is really memory-efficient, as you don't put whole chunk in memory
    chunk = islice(iterator, 50000)
    s = sum(chunk)
    
    # in case you really need whole chunk in memory, just turn isclice into list
    chunk = list(islice(iterator, 50000))
    last_in_chunk = chunk[-1]
    
    # and you always use same code to consume next chunk from your source
    # without maintaining any counters
    next_chunk = list(islice(iterator, 50000))
    

When your iterator is exhausted (there's no values left) you will get empty chunk(s). When there's not enough elements to create full chunk, you will get as much as is left there.

Upvotes: 1

Tirterra
Tirterra

Reputation: 696

cusid2 = [cusid[a:a+50000] for a in range(0, 950000, 50000)]

This is a list comprehension basically you will add to your list every element cusid[a: a+50000] for a going from 0 to 950000 (so 1m minus 50k) and iterate with a step of 50k so a will go up by 50k every iteration

Upvotes: 1

Related Questions