Adam Amin
Adam Amin

Reputation: 1456

Concatenate many files into one file without the header

I have three csv files (with the same name, e.g. A_bestInd.csv) that are located in different subfolders. I want to copy all of them into one file (e.g. All_A_bestInd.csv). To do that, I did the following:

{ find . -type f -name A_bestInd.csv -exec cat '{}' \; ; } >> All_A_bestInd.csv

The result of this command is the following:

Class   Conf        1   2   3   4 //header of file1
A       Reduction   5   1   2   1
A       Reduction   1   8   1   10
Class   Conf        1   2   3   4 //header of file2
A       No_red      2   1   3   2
A       No_red      3   6   1   9
Class   Conf        1   2   3   4 //header of file3
A       Reduction   5   5   8   9
A       Reduction   7   2   1   11

As you can see, the issue is the header of each file is copied. How can I change my command to keep only one header and avoid the rest?

Upvotes: 4

Views: 3542

Answers (3)

oguz ismail
oguz ismail

Reputation: 50785

Use to filter out header lines from all files but the first (except you have thousands of them):

find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'

NR==1 || FNR>1 means; if the number of current line from the start of input is 1, or, the number of current line from the start of current file is greater than 1, print current line.


$ cat A_bestInd.csv 
Class   Conf        1   2   3   4 //header of file3
A       Reduction   5   5   8   9
A       Reduction   7   2   1   11
$ 
$ cat foo/A_bestInd.csv 
Class   Conf        1   2   3   4 //header of file1
A       Reduction   5   1   2   1
A       Reduction   1   8   1   10
$ 
$ cat bar/A_bestInd.csv 
Class   Conf        1   2   3   4 //header of file2
A       No_red      2   1   3   2
A       No_red      3   6   1   9
$ 
$ find . -type f -name 'A_bestInd.csv' -exec awk 'NR==1 || FNR>1' {} + > 'All_A_bestInd.csv'
$
$ cat All_A_bestInd.csv 
Class   Conf        1   2   3   4 //header of file1
A       Reduction   5   1   2   1
A       Reduction   1   8   1   10
A       Reduction   5   5   8   9
A       Reduction   7   2   1   11
A       No_red      2   1   3   2
A       No_red      3   6   1   9

Upvotes: -1

William Pursell
William Pursell

Reputation: 212374

There are solutions with tail +2 and awk, but it seems to me the classic way to print all but the first line of a file is sed: sed -e 1d. So:

find . -type f -name A_bestInd.csv -exec sed -e 1d '{}' \; >> All_A_bestInd.csv

Upvotes: 1

John Kugelman
John Kugelman

Reputation: 361849

Use tail +2 to trim the headers from all the files.

find . -type f -name A_bestInd.csv -exec tail +2 {} \; >> All_A_bestInd.csv

To keep just one header you could combine it with head -1.

{ find . -type f -name A_bestInd.csv -exec head -1 {} \; -quit
  find . -type f -name A_bestInd.csv -exec tail +2 {} \; } >> All_A_bestInd.csv

Upvotes: 2

Related Questions