Reputation: 8615

How can I combine a set of text files, leaving off the first line of each?

As part of a normal workflow, I receive sets of text files, each containing a header row. It's more convenient for me to work with these as a single file, but if I cat them naively, the header rows in files after the first cause problems.

The files tend to be large enough (10³–10⁵ lines, 5–50 MB) and numerous enough that it's awkward and/or tedious to do this in an editor or step-by-step, e.g.:

$ wc -l *
    20251 1.csv
   124520 2.csv
    31158 3.csv
   175929 total

$ tail -n 20250 1.csv > 1.tmp

$ tail -n 124519 2.csv > 2.tmp

$ tail -n 31157 3.csv > 3.tmp

$ cat *.tmp > combined.csv

$ wc -l combined.csv
175926 combined.csv

It seems like this should be doable in one line. I've isolated the arguments that I need but I'm having trouble figuring out how to match them up with tail and subtract 1 from the line total (I'm not comfortable with awk):

$ wc -l * | grep -v "total" | xargs -n 2
20251 foo.csv
124520 bar.csv
31158 baz.csv
87457 zappa.csv
7310 bingo.csv
29968 niner.csv
2086 hella.csv

$ wc -l * | grep -v "total" | xargs -n 2 | tail -n
tail: option requires an argument -- n
Try 'tail --help' for more information.
xargs: echo: terminated by signal 13

Upvotes: 0

Answers (4)

karakfa

Reputation: 67507

Another sed alternative

    sed -s 1d *.csv

deletes first line from each input file, without -s it will only delete from the first file.

Upvotes: 0

anubhava

Reputation: 785481

Both tail and sed answers work fine.

For the sake of an alternative here is an awk command that does the same job:

awk 'FNR > 1' *.csv > combined.csv

FNR > 1 condition will skip first row for each file.

Upvotes: 3

Air

Reputation: 8615

You don't need to use wc -l to calculate the number of lines to output; tail can skip the first line (or the first K lines), just by adding a + symbol when using the -n (or --lines) option, as described in the man page:

  -n, --lines=K            output the last K lines, instead of the last 10;
                             or use -n +K to output starting with the Kth

This makes combining all files in a directory without the first line of each file as simple as:

$ tail -q -n +2 * > combined.csv

$ wc -l *
    20251 foo.csv
   124520 bar.csv
    31158 baz.csv
    87457 zappa.csv
     7310 bingo.csv
    29968 niner.csv
     2086 hella.csv
   302743 combined.csv
   605493 total

The -q flag suppresses headers in the output when globbing for multiple files with tail.

Upvotes: 7

Cyrus

Reputation: 88756

With GNU sed:

sed -ns '2,$p' 1.csv 2.csv 3.csv > combined.csv

sed -ns '2,$p' *.csv > combined.csv

Upvotes: 1

How can I combine a set of text files, leaving off the first line of each?

Answers (4)

Related Questions