Reputation: 7832
What'd be a good way to concatenate several files, but removing the header lines (number of header lines not known in advance), and keeping the first file header line as the header in the new concatenated file?
I'd like to do this in python, but awk or other languages would also work as long as I can use subprocess to call the unix command.
Note: The header lines all start with #.
Upvotes: 3
Views: 5565
Reputation: 36272
Using GNU awk
:
awk '
ARGIND == 1 { print; next }
/^[[:space:]]*#/ { next }
{ print }
' *.txt
Upvotes: 1
Reputation: 77135
Another awk
version:
awk '!flag && /#/ { print; flag=1; next } flag && /#/ { next } 1' f1 f2 f3
Upvotes: 0
Reputation: 13220
I would do as following;
(cat file1; sed '/^#/d' file2 file3 file4) > newFile
Upvotes: 7
Reputation: 200373
I'd probably do it like this:
#!/usr/bin/env python
import sys
for i in range(1, len(sys.argv)):
for line in open(sys.argv[i], "r"):
if i == 1 or not line.startswith("#"):
print line.rstrip('\n')
Run the script with the files as arguments and redirect the output to the result file:
$ ./combine.py foo.txt bar.txt baz.txt > result.txt
The header(s) will be taken from the first file of the argument list (foo.txt
in the example above).
Upvotes: 1
Reputation: 251041
Something like this using Python:
files = ["file1","file2","file3"]
with open("output_file","w") as outfile:
with open(files[0]) as f1:
for line in f1: #keep the header from file1
outfile.write(line)
for x in files[1:]:
with open(x) as f1:
for line in f1:
if not line.startswith("#"):
outfile.write(line)
You can also use the fileinput
module here:
This module implements a helper class and functions to quickly write a loop over standard input or a list of files.
import fileinput
header_over = False
with open("out_file","w") as outfile:
for line in fileinput.input():
if line.startswith("#") and not header_over:
outfile.write(line)
elif not line.startswith("#"):
outfile.write(line)
header_over = True
usage :$ python so.py file1 file2 file3
input:
file1:
#header file1
foo
bar
file2:
#header file2
spam
eggs
file3:
#header file3
python
file
output:
#header file1
foo
bar
spam
eggs
python
file
Upvotes: 5
Reputation: 23374
You could call a shell pipeline passing shell=True
to subprocess.Popen
cat f.1 ; grep -v -h '^#' f.2 f.3 f.4 f.5
Quick example
import sys, subprocess
p = subprocess.Popen('''cat f.1 ; grep -v -h '^#' f.2 f.3 f.4 f.5''', shell=True,
stdout=sys.stdout)
p.wait()
Upvotes: 1
Reputation:
Try this:
def combine(*files):
with open("result.txt","w+") as result:
for i in files:
with open(i,"r+") as f:
for line in f:
if not line.strip().startswith("#"):
result.write(line.rstrip())
combine("file1.txt","file2.txt")
file1.txt
:
#header2
body2
file2.txt
:
#header2
body2
result.txt
body2body
Upvotes: 1