user2468610
user2468610

Reputation: 55

How to extract line numbers from multiple files to a single file

I'm working on a project in statistical machine translation in which I have 15 files in a folder (linenumberfiles/). Each file contains multiple line numbers in the following format (one line number per line):

12

15

19

I would like to extract 10 random line numbers from each of the 15 files to a single output file (OutputLinesFile) The tricky part is that a few of the files might contain fewer than 10 line numbers, in which case I'd like to extract as many line numbers as possible to the output file. The format of the output file should be the same as the input files (one line number per line). This is the code I have so far:

import glob
OutputLinesFile = open('OutputLineNumbers', 'w')
inputfiles=glob.glob('linenumberfiles/*')

for file in inputfiles:
    readfile=open(file).readlines()
    OutputLinesFile.write( str(readfile) )
OutputLinesFile.close() 

Has anyone got any ideas how to solve this problem? In advance, thanks for your help!

Upvotes: 1

Views: 117

Answers (2)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250961

You can use random.shuffle and list slicing here:

import glob
import random
count = 10      #fetch at least this number of lines

with open('OutputLineNumbers', 'w') as fout:
   inputfiles=glob.glob('linenumberfiles/*')
   for file in inputfiles:
       with open(file) as f:
           lines = f.readlines()
           random.shuffle(lines)             #shuffle the lines
       fout.writelines(lines[:count]) #pick at most first 10 lines

or using random.randrange:

lines = f.readlines()
lines = [ lines[random.randrange(0, len(lines)] for _ in xrange(count) ]

and then : fout.writelines(lines)

Upvotes: 2

LarsVegas
LarsVegas

Reputation: 6812

First of all, you should use the with statement. Read here why. Example:

try:
    with open(file, 'r') as f:
        cont = f.readlines()
except IOError, err:
    print err  

Then you should have a look at the random module. To select random items from f use the sample-method. To check how many lines are n the input file just use the BIF len().

Upvotes: 0

Related Questions