Reputation: 1942
I know sed or awk can tackle this kind of problem more elegantly perhaps. But I went the python way, so the problem is that I would like to renumber the first column of my data file from 1 to #of lines in the file. Is that a good idea to read the file by readlines? For small files perhaps, but large files not I suppose. So here is what I came up as a first attempt, any comments are appreciated.
#!/usr/bin/env python
import sys
try:
infilename = sys.argv[1]; outfilename = sys.argv[2];
except:
print "Usage is <script> inFile outFile"
ifile = open(infilename,'r')
ofile = open(outfilename, 'w')
lines = ifile.readlines();
i=1
for line in lines:
list = line.split();
list[0] = i
i += 1
for val in list:
ofile.write("%d " % int(val))
ofile.write('\n')
del list
ifile.close()
ofile.close()
Upvotes: 1
Views: 961
Reputation: 178115
You don't need to split the whole line, just split the first column:
for i,line in enumerate(ifile,1):
first,remaining = line.split(' ',1)
ofile.write("{0} {1}".format(i,remaining))
Also, your except
needs to exit or the rest of the file will run anyway.
Upvotes: 0
Reputation: 2133
#!/usr/bin/env python
import sys
try:
ifile = open(sys.argv[1], 'r')
ofile = open(sys.argv[2], 'w+')
except:
print "Usage is <script> inFile outFile"
else:
for i, line in enumerate(ifile, start=1):
items = [str(i)] + line.split()[1:]
ofile.write(' '.join(items) + '\n')
ifile.close()
ofile.close()
There are a few points I'd like to discuss with my answer. The first is the try block, where I'm checking that I can open the files. If no filenames are input, or if either file isn't openable, you'll get the usage message. You could of course break this up: check for text, and return appropriately return usage, and try opening the files, and appropriately return file opening failed. Or, you could check for specific exceptions and return different messages.
Next, enumeration is a convenient way to have the interpreter keep track of the index. In the loop itself, I'm joining the enumeration index and a 'slice' of the read line (everything but the first item). I then join those with a space and write them with a newline.
This is clear and short.
Upvotes: 1
Reputation: 107726
You can iterate over the file to keep only the current line in memory:
#!/usr/bin/env python
import sys
try:
# dont use ; !
infilename = sys.argv[1]
outfilename = sys.argv[2]
except:
print "Usage is <script> inFile outFile"
# you could use `with` here if you have a Python 2.7
ifile = open(infilename,'r')
ofile = open(outfilename, 'w')
# no need to count yourself, enumerate does that
# plus when you iterate over a file you get lines too
for i, line in enumerate(ifile, start=1):
# dont shadow builtins like `list`
parts = line.split()
parts[0] = i
# join is the inverse function to split
new_line = ' '.join("%d" % int(val) for val in parts)
ofile.write(new_line + '\n')
ifile.close()
ofile.close()
@Umut Tabak: ("%d" % int(val) for val in parts)
is a generator expression, they are kind of like lazy lists. It gives the same items as the list comprehension ["%d" % int(val) for val in parts]
but without actually creating the list.
Btw, the for block can be written even shorter, but it's slightly different because it doesn't enforce that all lines are int
s anymore:
for i, line in enumerate(ifile, start=1):
parts = line.split()
parts[0] = "%d" % i
new_line = ' '.join(parts)
ofile.write(new_line + '\n')
Upvotes: 1
Reputation: 6793
with open(infilename,'r') as ifile:
with open(outfilename, 'w') as ofile:
for (nr, line) in enumerate(ifile):
line = line.split()
line[0] = nr
line.append('\n')
ofile.write(' '.join(line))
Upvotes: 1
Reputation: 994491
Don't do the readlines()
at all, and instead:
for line in ifile:
Also, avoid naming variables with the name list
. Since list()
is a built-in function, you're shadowing that name which is poor practice.
There is no need to del
a local variable like you've done with del list
; this is automatically taken care of by Python's garbage collector. (In CPython, the garbage collector is reference-counted and deterministic.)
Upvotes: 1