Eric Bal
Eric Bal

Reputation: 1185

Randomly Select Data from A Large Text File

I am working with 8GB RAM in 64 bit Windows7.

I have a text file of 30GB with one column of numeric data.

I have to randomly select 5% of its lines randomly. I started as follows:

fi = "data.txt"
lines = fi.read().splitlines()

Memory Error...

Do you have any ideas, guys?

Upvotes: 0

Views: 90

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336138

If "about 5 %" is good enough for you, you could read the file line by line, and give each line a 5 % chance to be included in your list:

import random
result = []
with open("data.txt") as f:
    for line in f:
        if random.random() < 0.05:
            result.append(line)

Upvotes: 4

Related Questions