Reputation: 1645
I have n files in a folder which starts with lines as shown below.
##contig=<ID=chr38,length=23914537>
##contig=<ID=chrX,length=123869142>
##contig=<ID=chrMT,length=16727>
##samtoolsVersion=0.1.19-44428cd
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120
chr1 412573 SNP74 A C 2040.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1 602567 BICF2G630707977 A G 877.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 604894 BICF2G630707978 A G 2044.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 693376 . GCCCCC GCCCC 761.73 . AC=2;AC1=2;AF=1.00;AF1=1;
There are n such files. I want to concatenate all the files into a single file such that all the lines begining with # should be deleted from all the files and concatenate the rest of the rows from all the files only retaining the header line. Example output is shown below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120
chr1 412573 SNP74 A C 2040.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1 602567 BICF2G630707977 A G 877.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 604894 BICF2G630707978 A G 2044.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 693376 . GCCCCC GCCCC 761.73 . AC=2;AC1=2;AF=1.00;AF1=1;
Upvotes: 0
Views: 256
Reputation: 207465
Or you can use grep
like this:
grep -vh "^##" *
The -v
means inverted
, so the command means... look for all lines NOT starting ##
in all files and don't print filenames (-h
).
Or, if you want to emit 1 header line at the start,
(grep -m1 ^#CHROM * ; grep -hv ^## * ) > out.txt
Upvotes: 0
Reputation: 2552
I believe what you want is
awk '$0 ~/^##/ { next; } $0 ~ /^#/ && !printed_header {print; printed_header=1 } $0! ~ /^#/ {print }' file1 file2 file3
Upvotes: 0
Reputation: 310
If I understood correctly, you could do:
echo "#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120" > mergedfile
for file in $FILES; do cat $file | grep -v "#" >> mergedfile; done
Note that $FILES could be ls
and the -v option in grep is the non-match flag.
Upvotes: 0
Reputation: 3393
Specifically with awk:
awk '$0!~/^#/{print $0}' file1 file2 file3 > outputfile
Broken down you are checking if the line ($0) does not match (!~) a string beginning with # (/^#/) and if so, print the line. You take input files and write to (>) outputfile.
Upvotes: 2
Reputation: 212268
Your problem is not terribly well specified, but I think you are just looking for:
sed '/^##/d' $FILE_LIST > output
Where FILE_LIST
is the list of input files( you may be able to use *
)
Upvotes: 0