How to split a file (with sed) into numerous files according to a value found on each line?

Question

I have several Company_***.csv files (altough the separator's a tab not a comma; hence should be *.tsv, but never mind) which contains a header plus numerous data lines e.g

1stHeader   2ndHeader   DateHeader  OtherHeaders...
111111111   SOME STRING 2020-08-01  OTHER STRINGS..
222222222   ANOT STRING 2020-08-02  OTHER STRINGS..

I have to split them according to the 3rd column here, it's a date.

Each file should be named like e.g. Company_2020_08_01.csv Company_2020_08_02.csv & so one and containing: same header on the 1st line + matching rows as the following lines.

At first I thought about saving (once) the header in a single file e.g.

 sed -n '1w Company_header.csv' Company_*.csv

then parsing the files with a pattern for the date (hence the headers would be skipped) e.g.

sed -n '/	2020-[01][0-9]-[0-3][0-9]	/w somefilename.csv' Company_*.csv

... and at last, insert the (missing) header in each generated file.

But I'm stuck at step 2: I can't find how I could generate (dynamically) the "filename" expected by the w command, neither how to capture the date in the search pattern (because apparently this is just an address, not a search-replace "field" as in the s/regexp/replacement/[flags] command, so you can't have capturing groups ( ) in there).

So I wonder if this is actually doable with sed? Or should I look upon other tools e.g. awk?

Disclaimer: I'm quite a n00b with these commands so I'm just learning/starting from scratch...

How to split a file (with sed) into numerous files according to a value found on each line?

Answers (1)

Related Questions