back-new
back-new

Reputation: 121

best way to process large data in chunks

I have like for example more then 20000 records . and data is something like this

data = [{'id': 1} , {'id':2} , {'id':3} .... 20000]

now I want to upload this data in chunks of 1000 . so what is best way to do this in 1000 chunk which will give least overhead .

Upvotes: 0

Views: 894

Answers (2)

user10869670
user10869670

Reputation: 106

You can use generators to process the data in batches

def generateChunks(data,    batchsize):
  for i in range(0, len(data),batchsize):
    yield data[i:i+batchsize]

Process data

chunks = generateChunks(data, 1000)
for i, chunk in enumerate(chunks):
  print('chunk# ', i ,': ', chunk)

Upvotes: 0

Daniel Trugman
Daniel Trugman

Reputation: 8491

The best way is to use generators. These are generic iterators that allow you traverse objects using custom behaviours.

In your case, an easy solution is to use range which returns a generator of any specific size, for example:

range(1, len(data), 1000)

Will generate the values 1, 1001, 2001, ...

If you use that in a loop, you can then pass the specific range to a handler method, for example:

batch = 1000
for i in range(1, len(data), batch):
  handle(data[i:i+batch])

Upvotes: 2

Related Questions