Dnaiel
Dnaiel

Reputation: 7832

concatenate several file remove header lines

What'd be a good way to concatenate several files, but removing the header lines (number of header lines not known in advance), and keeping the first file header line as the header in the new concatenated file?

I'd like to do this in python, but awk or other languages would also work as long as I can use subprocess to call the unix command.

Note: The header lines all start with #.

Upvotes: 3

Views: 5565

Answers (7)

Birei
Birei

Reputation: 36272

Using GNU awk:

awk '
    ARGIND == 1 { print; next } 
    /^[[:space:]]*#/ { next }
    { print }
' *.txt

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77135

Another awk version:

awk '!flag && /#/ { print; flag=1; next } flag && /#/ { next } 1' f1 f2 f3

Upvotes: 0

Alper
Alper

Reputation: 13220

I would do as following;

(cat file1; sed '/^#/d' file2 file3 file4) > newFile

Upvotes: 7

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200373

I'd probably do it like this:

#!/usr/bin/env python

import sys 

for i in range(1, len(sys.argv)):
    for line in open(sys.argv[i], "r"):
        if i == 1 or not line.startswith("#"):
            print line.rstrip('\n')

Run the script with the files as arguments and redirect the output to the result file:

$ ./combine.py foo.txt bar.txt baz.txt > result.txt

The header(s) will be taken from the first file of the argument list (foo.txt in the example above).

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251041

Something like this using Python:

files = ["file1","file2","file3"]

with open("output_file","w") as outfile:
    with open(files[0]) as f1:
        for line in f1:        #keep the header from file1
            outfile.write(line)

    for x in files[1:]:
        with open(x) as f1:
            for line in f1:
                if not line.startswith("#"):
                    outfile.write(line)

You can also use the fileinput module here:

This module implements a helper class and functions to quickly write a loop over standard input or a list of files.

import fileinput
header_over = False
with open("out_file","w") as outfile:
    for line in fileinput.input():
        if line.startswith("#") and not header_over:
            outfile.write(line)
        elif not line.startswith("#"):
            outfile.write(line)
            header_over = True

usage :$ python so.py file1 file2 file3

input:

file1:

#header file1
foo
bar

file2:

#header file2
spam
eggs

file3:

#header file3
python
file

output:

#header file1
foo
bar

spam
eggs

python
file

Upvotes: 5

iruvar
iruvar

Reputation: 23374

You could call a shell pipeline passing shell=True to subprocess.Popen

cat f.1 ;  grep -v -h '^#' f.2 f.3 f.4 f.5

Quick example

import sys, subprocess
p = subprocess.Popen('''cat f.1 ;  grep -v -h '^#' f.2 f.3 f.4 f.5''', shell=True,
stdout=sys.stdout)
p.wait()

Upvotes: 1

user1786283
user1786283

Reputation:

Try this:

def combine(*files):
    with open("result.txt","w+") as result:
        for i in files:
            with open(i,"r+") as f:
                for line in f:
                    if not line.strip().startswith("#"):
                        result.write(line.rstrip())



combine("file1.txt","file2.txt")

file1.txt:

#header2
body2

file2.txt:

#header2
body2

result.txt

body2body

Upvotes: 1

Related Questions