Reputation: 871

Python: Concise / elegant way to reformat a set of text files?

I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?

Python Code

import os
import fileinput
import glob
import string

indir='./'
outdir='./processed/'

for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
    fin=open(indir+filename,'r')   # input file
    fout=open(outdir+filename,'w') # out: processed file

    lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
    fout.write(next(lines)) # just copy the first line (the header) to output

    for line in lines:
        val=iter(string.split(line,' '))
        fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
        for x in val: # iterate over the rest of the numbers in the line
            fout.write('{0:10.6f}'.format(float(val.next()))),  # the rest of the values in the line has a different format 
        fout.write('\n')

    fin.close()
    fout.close()

An example:

Input:

;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398

Processed:

;;; This line is the header line
-5.00  1.003466  0.786494  0.437988  0.087808
-4.99  1.002548  0.785774  0.437586  0.087727
-4.98  1.001632  0.785055  0.437185  0.087647
-4.97  1.000717  0.784338  0.436785  0.087567
-4.96  0.999805  0.783622  0.436386  0.087486

Upvotes: 2

Answers (4)

stranac

Reputation: 28266

fin=open(indir+filename,'r')   # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()

can be written as:

with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
    #code

In python 2.6, you can use:

with open(indir+filename,'r') as fin:
    with open(outdir+filename,'w') as fout:
        #code

And the line

lines = iter(fileinput.input([indir+filename]))

is useless. You can just iterate over an open file(fin in your case)

You can also do line.split(' ') instead of string.split(line, ' ')

If you change those things, there is no need to import string and fileinput.

Edit: I didn't know you can use inline code. That's cool

Upvotes: 1

Andrew Dalke

Reputation: 15335

Other than a few minor changes, due to how Python has changed through time, this looks fine.

You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)

Edit: As @codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.

Here's how I would write this in more idiomatic modern Python

def reformat(fin, fout):
    fout.write(next(fin)) # just copy the first line (the header) to output
    for line in fin:
        fields = line.split(' ')

        # Make a format header specific to the number of fields
        fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'

        fout.write(fmt % tuple(map(float, fields)))

basenames = os.listdir(indir)  # get a list of input ASCII files to be processed
for basename in basenames:
    input_filename = os.path.join(indir, basename)
    output_filename = os.path.join(outdir, basename)
    with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
        reformat(fin, fout)

The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)

Upvotes: 5

Ski

Reputation: 14497

I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').

Well maybe I would write the string-processing part like this:

values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))

At least for me this looks better but this might be subjective :)

Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').

Upvotes: 0

Oliver

Reputation: 11617

In my build script, I have this code:

inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
    line = doKeywordSubstitution(line)
    outFile.write(line)
inFile.close()
outFile.close()

I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.

I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).