Air
Air

Reputation: 8615

How can I combine a set of text files, leaving off the first line of each?

As part of a normal workflow, I receive sets of text files, each containing a header row. It's more convenient for me to work with these as a single file, but if I cat them naively, the header rows in files after the first cause problems.

The files tend to be large enough (103–105 lines, 5–50 MB) and numerous enough that it's awkward and/or tedious to do this in an editor or step-by-step, e.g.:

$ wc -l *
    20251 1.csv
   124520 2.csv
    31158 3.csv
   175929 total

$ tail -n 20250 1.csv > 1.tmp

$ tail -n 124519 2.csv > 2.tmp

$ tail -n 31157 3.csv > 3.tmp

$ cat *.tmp > combined.csv

$ wc -l combined.csv
175926 combined.csv

It seems like this should be doable in one line. I've isolated the arguments that I need but I'm having trouble figuring out how to match them up with tail and subtract 1 from the line total (I'm not comfortable with awk):

$ wc -l * | grep -v "total" | xargs -n 2
20251 foo.csv
124520 bar.csv
31158 baz.csv
87457 zappa.csv
7310 bingo.csv
29968 niner.csv
2086 hella.csv

$ wc -l * | grep -v "total" | xargs -n 2 | tail -n
tail: option requires an argument -- n
Try 'tail --help' for more information.
xargs: echo: terminated by signal 13

Upvotes: 0

Views: 155

Answers (4)

karakfa
karakfa

Reputation: 67507

Another sed alternative

    sed -s 1d *.csv

deletes first line from each input file, without -s it will only delete from the first file.

Upvotes: 0

anubhava
anubhava

Reputation: 785481

Both tail and sed answers work fine.

For the sake of an alternative here is an awk command that does the same job:

awk 'FNR > 1' *.csv > combined.csv

FNR > 1 condition will skip first row for each file.

Upvotes: 3

Air
Air

Reputation: 8615

You don't need to use wc -l to calculate the number of lines to output; tail can skip the first line (or the first K lines), just by adding a + symbol when using the -n (or --lines) option, as described in the man page:

  -n, --lines=K            output the last K lines, instead of the last 10;
                             or use -n +K to output starting with the Kth

This makes combining all files in a directory without the first line of each file as simple as:

$ tail -q -n +2 * > combined.csv

$ wc -l *
    20251 foo.csv
   124520 bar.csv
    31158 baz.csv
    87457 zappa.csv
     7310 bingo.csv
    29968 niner.csv
     2086 hella.csv
   302743 combined.csv
   605493 total

The -q flag suppresses headers in the output when globbing for multiple files with tail.

Upvotes: 7

Cyrus
Cyrus

Reputation: 88756

With GNU sed:

sed -ns '2,$p' 1.csv 2.csv 3.csv > combined.csv

or

sed -ns '2,$p' *.csv > combined.csv

Upvotes: 1

Related Questions