Reputation: 1185
I am working with 8GB RAM in 64 bit Windows7.
I have a text file of 30GB with one column of numeric data.
I have to randomly select 5% of its lines randomly. I started as follows:
fi = "data.txt"
lines = fi.read().splitlines()
Memory Error...
Do you have any ideas, guys?
Upvotes: 0
Views: 90
Reputation: 336138
If "about 5 %" is good enough for you, you could read the file line by line, and give each line a 5 % chance to be included in your list:
import random
result = []
with open("data.txt") as f:
for line in f:
if random.random() < 0.05:
result.append(line)
Upvotes: 4