Reputation: 2334
I have several Company_***.csv files
(altough the separator's a tab not a comma; hence should be *.tsv, but never mind) which contains a header plus numerous data lines e.g
1stHeader 2ndHeader DateHeader OtherHeaders...
111111111 SOME STRING 2020-08-01 OTHER STRINGS..
222222222 ANOT STRING 2020-08-02 OTHER STRINGS..
I have to split them according to the 3rd column here, it's a date.
Each file should be named like e.g. Company_2020_08_01.csv
Company_2020_08_02.csv
& so one
and containing: same header on the 1st line + matching rows as the following lines.
At first I thought about saving (once) the header in a single file e.g.
sed -n '1w Company_header.csv' Company_*.csv
then parsing the files with a pattern for the date (hence the headers would be skipped) e.g.
sed -n '/\t2020-[01][0-9]-[0-3][0-9]\t/w somefilename.csv' Company_*.csv
... and at last, insert the (missing) header in each generated file.
But I'm stuck at step 2: I can't find how I could generate (dynamically) the "filename" expected by the w
command, neither how to capture the date in the search pattern (because apparently this is just an address, not a search-replace "field" as in the s/regexp/replacement/[flags]
command, so you can't have capturing groups ( )
in there).
So I wonder if this is actually doable with sed
? Or should I look upon other tools e.g. awk
?
Disclaimer: I'm quite a n00b with these commands so I'm just learning/starting from scratch...
Upvotes: 0
Views: 50
Reputation: 241988
Perl to the rescue!
perl -e 'while (<>) {
$h = $_, next if $. == 1;
$. = 0 if eof;
@c = split /\t/;
open my $out, ">>", "Company_" . $c[2] =~ tr/-/_/r . ".csv" or die $!;
print {$out} $h unless tell $out;
print {$out} $_;
}' -- Company_*.csv
<>
in scalar context reads a line from the input.$h
, see $. and eof@c
array by the column values for each line$c[2]
contains the date, using tr
we translate dashes to underscores to create a filename from it. open opens the file for appending.Note that it only appends to the files, so don't forget to delete any output files before running the script again.
Upvotes: 1