Reputation: 8615
As part of a normal workflow, I receive sets of text files, each containing a header row. It's more convenient for me to work with these as a single file, but if I cat
them naively, the header rows in files after the first cause problems.
The files tend to be large enough (103–105 lines, 5–50 MB) and numerous enough that it's awkward and/or tedious to do this in an editor or step-by-step, e.g.:
$ wc -l *
20251 1.csv
124520 2.csv
31158 3.csv
175929 total
$ tail -n 20250 1.csv > 1.tmp
$ tail -n 124519 2.csv > 2.tmp
$ tail -n 31157 3.csv > 3.tmp
$ cat *.tmp > combined.csv
$ wc -l combined.csv
175926 combined.csv
It seems like this should be doable in one line. I've isolated the arguments that I need but I'm having trouble figuring out how to match them up with tail
and subtract 1 from the line total (I'm not comfortable with awk
):
$ wc -l * | grep -v "total" | xargs -n 2
20251 foo.csv
124520 bar.csv
31158 baz.csv
87457 zappa.csv
7310 bingo.csv
29968 niner.csv
2086 hella.csv
$ wc -l * | grep -v "total" | xargs -n 2 | tail -n
tail: option requires an argument -- n
Try 'tail --help' for more information.
xargs: echo: terminated by signal 13
Upvotes: 0
Views: 155
Reputation: 67507
Another sed
alternative
sed -s 1d *.csv
deletes first line from each input file, without -s
it will only delete from the first file.
Upvotes: 0
Reputation: 785481
Both tail
and sed
answers work fine.
For the sake of an alternative here is an awk
command that does the same job:
awk 'FNR > 1' *.csv > combined.csv
FNR > 1
condition will skip first row for each file.
Upvotes: 3
Reputation: 8615
You don't need to use wc -l
to calculate the number of lines to output; tail
can skip the first line (or the first K lines), just by adding a +
symbol when using the -n
(or --lines
) option, as described in the man page:
-n, --lines=K output the last K lines, instead of the last 10;
or use -n +K to output starting with the Kth
This makes combining all files in a directory without the first line of each file as simple as:
$ tail -q -n +2 * > combined.csv
$ wc -l *
20251 foo.csv
124520 bar.csv
31158 baz.csv
87457 zappa.csv
7310 bingo.csv
29968 niner.csv
2086 hella.csv
302743 combined.csv
605493 total
The -q
flag suppresses headers in the output when globbing for multiple files with tail
.
Upvotes: 7
Reputation: 88756
With GNU sed:
sed -ns '2,$p' 1.csv 2.csv 3.csv > combined.csv
or
sed -ns '2,$p' *.csv > combined.csv
Upvotes: 1