mrgloom
mrgloom

Reputation: 21602

How to work with large files in python?

I'm curiuos how to work with large files in python?

For example I have dataset on hard drive ~20Gb (just array of numbers) and I want to sort this array to get k min values. So dataset can't be load into memory(RAM).

I think algorithm should be: load dataset by n chunks, find k min in chunk, store k min in memory and process every chunk, so we get k*n values and then sort them to get k min values.

But the question is how to store dataset(what format?), what is the fastest method to load it from disk(what size of chunk I must choose for particular hardware?)Maybe it can be done by using several threads?

Upvotes: 1

Views: 160

Answers (1)

AechoLiu
AechoLiu

Reputation: 18368

You need external sort instead. If you load everything into memory and sort them, it is named internal sort. In database, it uses external sort to do sorting task.

Maybe the following resources would help you.

Upvotes: 1

Related Questions