Reputation: 55
I'm working on a project in statistical machine translation in which I have 15 files in a folder (linenumberfiles/). Each file contains multiple line numbers in the following format (one line number per line):
12
15
19
I would like to extract 10 random line numbers from each of the 15 files to a single output file (OutputLinesFile) The tricky part is that a few of the files might contain fewer than 10 line numbers, in which case I'd like to extract as many line numbers as possible to the output file. The format of the output file should be the same as the input files (one line number per line). This is the code I have so far:
import glob
OutputLinesFile = open('OutputLineNumbers', 'w')
inputfiles=glob.glob('linenumberfiles/*')
for file in inputfiles:
readfile=open(file).readlines()
OutputLinesFile.write( str(readfile) )
OutputLinesFile.close()
Has anyone got any ideas how to solve this problem? In advance, thanks for your help!
Upvotes: 1
Views: 117
Reputation: 250961
You can use random.shuffle
and list slicing here:
import glob
import random
count = 10 #fetch at least this number of lines
with open('OutputLineNumbers', 'w') as fout:
inputfiles=glob.glob('linenumberfiles/*')
for file in inputfiles:
with open(file) as f:
lines = f.readlines()
random.shuffle(lines) #shuffle the lines
fout.writelines(lines[:count]) #pick at most first 10 lines
or using random.randrange
:
lines = f.readlines()
lines = [ lines[random.randrange(0, len(lines)] for _ in xrange(count) ]
and then : fout.writelines(lines)
Upvotes: 2
Reputation: 6812
First of all, you should use the with
statement. Read here why. Example:
try:
with open(file, 'r') as f:
cont = f.readlines()
except IOError, err:
print err
Then you should have a look at the random
module. To select random items from f use the sample
-method. To check how many lines are n the input file just use the BIF len()
.
Upvotes: 0