chas
chas

Reputation: 1645

concatenate files awk/linux

I have n files in a folder which starts with lines as shown below.

##contig=<ID=chr38,length=23914537>
##contig=<ID=chrX,length=123869142>
##contig=<ID=chrMT,length=16727>
##samtoolsVersion=0.1.19-44428cd
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120
chr1    412573  SNP74   A       C       2040.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1    602567  BICF2G630707977 A       G       877.77  PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;  
chr1    604894  BICF2G630707978 A       G       2044.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1    693376  .       GCCCCC  GCCCC   761.73  .       AC=2;AC1=2;AF=1.00;AF1=1;

There are n such files. I want to concatenate all the files into a single file such that all the lines begining with # should be deleted from all the files and concatenate the rest of the rows from all the files only retaining the header line. Example output is shown below:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120
chr1    412573  SNP74   A       C       2040.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1    602567  BICF2G630707977 A       G       877.77  PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;  
chr1    604894  BICF2G630707978 A       G       2044.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1    693376  .       GCCCCC  GCCCC   761.73  .       AC=2;AC1=2;AF=1.00;AF1=1;

Upvotes: 0

Views: 256

Answers (5)

Mark Setchell
Mark Setchell

Reputation: 207465

Or you can use grep like this:

grep -vh "^##" *

The -v means inverted, so the command means... look for all lines NOT starting ## in all files and don't print filenames (-h).

Or, if you want to emit 1 header line at the start,

(grep -m1 ^#CHROM * ; grep -hv ^## * ) > out.txt

Upvotes: 0

toth
toth

Reputation: 2552

I believe what you want is

awk '$0 ~/^##/ { next; } $0 ~ /^#/ && !printed_header {print; printed_header=1 } $0! ~ /^#/ {print }' file1 file2 file3 

Upvotes: 0

muzio
muzio

Reputation: 310

If I understood correctly, you could do:

echo "#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120" > mergedfile
for file in $FILES; do cat $file | grep -v "#" >> mergedfile; done

Note that $FILES could be ls and the -v option in grep is the non-match flag.

Upvotes: 0

Geraint Anderson
Geraint Anderson

Reputation: 3393

Specifically with awk:

awk '$0!~/^#/{print $0}' file1 file2 file3 > outputfile

Broken down you are checking if the line ($0) does not match (!~) a string beginning with # (/^#/) and if so, print the line. You take input files and write to (>) outputfile.

Upvotes: 2

William Pursell
William Pursell

Reputation: 212268

Your problem is not terribly well specified, but I think you are just looking for:

sed '/^##/d' $FILE_LIST > output

Where FILE_LIST is the list of input files( you may be able to use *)

Upvotes: 0

Related Questions