Reputation: 12246
I am currently reproducing the following Unix command:
cat command.info fort.13 > command.fort.13
in Python with the following:
with open('command.fort.13', 'w') as outFile:
with open('fort.13', 'r') as fort13, open('command.info', 'r') as com:
for line in com.read().split('\n'):
if line.strip() != '':
print >>outFile, line
for line in fort13.read().split('\n'):
if line.strip() != '':
print >>outFile, line
which works, but there has to be a better way. Any suggestions?
Edit (2016):
This question has started getting attention again after four years. I wrote up some thoughts in a longer Jupyter Notebook here.
The crux of the issue is that my question was pertaining to the (unexpected by me) behavior of readlines
. The answer I was aiming toward could have been better asked, and that question would have been better answered with read().splitlines()
.
Upvotes: 12
Views: 61277
Reputation: 11571
The easiest way might be simply to forget about the lines, and just read in the entire file, then write it to the output:
with open('command.fort.13', 'wb') as outFile:
with open('command.info', 'rb') as com, open('fort.13', 'rb') as fort13:
outFile.write(com.read())
outFile.write(fort13.read())
As pointed out in a comment, this can cause high memory usage if either of the inputs is large (as it copies the entire file into memory first). If this might be an issue, the following will work just as well (by copying the input files in chunks):
import shutil
with open('command.fort.13', 'wb') as outFile:
with open('command.info', 'rb') as com, open('fort.13', 'rb') as fort13:
shutil.copyfileobj(com, outFile)
shutil.copyfileobj(fort13, outFile)
Upvotes: 17
Reputation: 414179
#!/usr/bin/env python
import fileinput
for line in fileinput.input():
print line,
Usage:
$ python cat.py command.info fort.13 > command.fort.13
Or to allow arbitrary large lines:
#!/usr/bin/env python
import sys
from shutil import copyfileobj as copy
for filename in sys.argv[1:] or ["-"]:
if filename == "-":
copy(sys.stdin, sys.stdout)
else:
with open(filename, 'rb') as file:
copy(file, sys.stdout)
The usage is the same.
Or on Python 3.3 using os.sendfile()
:
#!/usr/bin/env python3.3
import os
import sys
output_fd = sys.stdout.buffer.fileno()
for filename in sys.argv[1:]:
with open(filename, 'rb') as file:
while os.sendfile(output_fd, file.fileno(), None, 1 << 30) != 0:
pass
The above sendfile()
call is written for Linux > 2.6.33. In principle, sendfile()
can be more efficient than a combination of read/write used by other approaches.
Upvotes: 8
Reputation: 129
List comprehensions are awesome for things like this:
with open('command.fort.13', 'w') as output:
for f in ['fort.13', 'command.info']:
output.write(''.join([line for line in open(f).readlines() if line.strip()]))
Upvotes: 1
Reputation: 375574
You can simplify this in a few ways:
with open('command.fort.13', 'w') as outFile:
with open('fort.13', 'r') as fort13, open('command.info', 'r') as com:
for line in com:
if line.strip():
print >>outFile, line
for line in fort13:
if line.strip():
print >>outFile, line
More importantly, the shutil module has the copyfileobj function:
with open('command.fort.13', 'w') as outFile:
with open('fort.13', 'r') as fort13:
shutil.copyfileobj(com, outFile)
with open('command.info', 'r') as com:
shutil.copyfileobj(fort13, outFile)
This doesn't skip the blank lines, but cat doesn't do that either, so I'm not sure you really want to.
Upvotes: 1
Reputation: 184161
def cat(outfilename, *infilenames):
with open(outfilename, 'w') as outfile:
for infilename in infilenames:
with open(infilename) as infile:
for line in infile:
if line.strip():
outfile.write(line)
cat('command.fort.13', 'fort.13', 'command.info')
Upvotes: 8
Reputation: 798606
Iterating over a file yields lines.
for line in infile:
outfile.write(line)
Upvotes: 1