Reputation: 635
I have a input file which looks like
1S6290615260715DUTCH-ALDI ROTTERDAM, EUDOKIAPLEIN 8 00002961999
20000010019149GRANEN 0000000100000001590 0000111
20000010019592ALASKA KOOLVISFILET 0000001270000024003 0000111
20000010022614PAPRIKA 3 ST 0000000460000005934 0000111
1S6290615260715DUTCH-ALDI BERGEN NH, JAN OLDENBURGLAAN 00002962888
20000000000404BLEEKMIDDEL 0000000900000003150 0000222
20000000005197FRUIT 0000000430000005977 0000222
20000000006013ROOIBOSTHEE 0000000140000001246 0000222
1S6290615260715DUTCH-ALDI DWINGELOO, HEUVELENWEG 00002963777
20000000006469PITABROODJES 0000000610000004209 0000333
20000000007372SCHENKSTROOP 0000000210000001869 0000333
20000000007545HUISVUILZAKKEN 0000001080000012852 0000333
1S6290615260715DUTCH-ALDI BARNEVELD, CATHARIJNESTEEG 00002964666
20000000005197FRUIT + GRANEN BISCUITS 0000000720000010008 0000444
20000000005209IJSASSORTI MINIMIX 0000000190000003781 0000444
20000000006013ROOIBOSTHEE 0000000210000001869 0000444
I need the break this file into multiple files based on the pattern match. In this file the pattern line begins with 1S6290615260715, based on that I need to create multiple files like
File 1:
1S6290615260715DUTCH-ALDI ROTTERDAM, EUDOKIAPLEIN 8 00002961999
20000010019149GRANEN 0000000100000001590 0000111
20000010019592ALASKA KOOLVISFILET 0000001270000024003 0000111
20000010022614PAPRIKA 3 ST 0000000460000005934 0000111
File 2
1S6290615260715DUTCH-ALDI BERGEN NH, JAN OLDENBURGLAAN 00002962888
20000000000404BLEEKMIDDEL 0000000900000003150 0000222
20000000005197FRUIT 0000000430000005977 0000222
20000000006013ROOIBOSTHEE 0000000140000001246 0000222
and so on.
Using awk i tried this command
awk '/^1S/f++ {print $0 > "file"f}' input.txt
with this each file is created with single line.
Please suggest the faster processing way either with sed or awk, because I need to do this for very larger files like 15GB to 20GB and provide these split files to hadoop framework for further processing.
Upvotes: 2
Views: 525
Reputation: 785246
You can use this awk:
awk '/^1S/{if (f) close(f); f = "file" ++i} {print > f}' file
Upvotes: 2