Umut Tabak
Umut Tabak

Reputation: 1942

change the numbers in the 1st column

I know sed or awk can tackle this kind of problem more elegantly perhaps. But I went the python way, so the problem is that I would like to renumber the first column of my data file from 1 to #of lines in the file. Is that a good idea to read the file by readlines? For small files perhaps, but large files not I suppose. So here is what I came up as a first attempt, any comments are appreciated.

#!/usr/bin/env python

import sys

try:
    infilename = sys.argv[1]; outfilename = sys.argv[2];
except:
    print "Usage is <script> inFile outFile"

ifile = open(infilename,'r')
ofile = open(outfilename, 'w')

lines = ifile.readlines();

i=1
for line in lines: 
    list = line.split();
    list[0] = i
    i += 1 
    for val in list:
        ofile.write("%d " % int(val))
    ofile.write('\n')
    del list

ifile.close()
ofile.close()

Upvotes: 1

Views: 961

Answers (5)

Mark Tolonen
Mark Tolonen

Reputation: 178115

You don't need to split the whole line, just split the first column:

for i,line in enumerate(ifile,1):
    first,remaining = line.split(' ',1)
    ofile.write("{0} {1}".format(i,remaining))

Also, your except needs to exit or the rest of the file will run anyway.

Upvotes: 0

Devin
Devin

Reputation: 2133

#!/usr/bin/env python
import sys

try:
    ifile = open(sys.argv[1], 'r')
    ofile = open(sys.argv[2], 'w+')
except:
    print "Usage is <script> inFile outFile"
else:
    for i, line in enumerate(ifile, start=1):
        items = [str(i)] + line.split()[1:]
        ofile.write(' '.join(items) + '\n')

    ifile.close()
    ofile.close()

There are a few points I'd like to discuss with my answer. The first is the try block, where I'm checking that I can open the files. If no filenames are input, or if either file isn't openable, you'll get the usage message. You could of course break this up: check for text, and return appropriately return usage, and try opening the files, and appropriately return file opening failed. Or, you could check for specific exceptions and return different messages.

Next, enumeration is a convenient way to have the interpreter keep track of the index. In the loop itself, I'm joining the enumeration index and a 'slice' of the read line (everything but the first item). I then join those with a space and write them with a newline.

This is clear and short.

Upvotes: 1

Jochen Ritzel
Jochen Ritzel

Reputation: 107726

You can iterate over the file to keep only the current line in memory:

#!/usr/bin/env python
import sys

try:
    # dont use ; !
    infilename = sys.argv[1]
    outfilename = sys.argv[2]
except:
    print "Usage is <script> inFile outFile"


# you could use `with` here if you have a Python 2.7
ifile = open(infilename,'r')
ofile = open(outfilename, 'w')

# no need to count yourself, enumerate does that
# plus when you iterate over a file you get lines too
for i, line in enumerate(ifile, start=1):
    # dont shadow builtins like `list`
    parts = line.split()
    parts[0] = i
    # join is the inverse function to split
    new_line = ' '.join("%d" % int(val) for val in parts)
    ofile.write(new_line + '\n')

ifile.close()
ofile.close()

@Umut Tabak: ("%d" % int(val) for val in parts) is a generator expression, they are kind of like lazy lists. It gives the same items as the list comprehension ["%d" % int(val) for val in parts] but without actually creating the list.

Btw, the for block can be written even shorter, but it's slightly different because it doesn't enforce that all lines are ints anymore:

for i, line in enumerate(ifile, start=1):
    parts = line.split()
    parts[0] = "%d" % i
    new_line = ' '.join(parts)
    ofile.write(new_line + '\n')

Upvotes: 1

virhilo
virhilo

Reputation: 6793

with open(infilename,'r') as ifile:
    with open(outfilename, 'w') as ofile:
         for (nr, line) in enumerate(ifile):
             line = line.split()
             line[0] = nr
             line.append('\n')
             ofile.write(' '.join(line))

Upvotes: 1

Greg Hewgill
Greg Hewgill

Reputation: 994491

Don't do the readlines() at all, and instead:

for line in ifile: 

Also, avoid naming variables with the name list. Since list() is a built-in function, you're shadowing that name which is poor practice.

There is no need to del a local variable like you've done with del list; this is automatically taken care of by Python's garbage collector. (In CPython, the garbage collector is reference-counted and deterministic.)

Upvotes: 1

Related Questions