Abhinay
Abhinay

Reputation: 635

How to create multiple files based on a pattern match using sed or awk

I have a input file which looks like

1S6290615260715DUTCH-ALDI          ROTTERDAM, EUDOKIAPLEIN 8                          00002961999
20000010019149GRANEN                                            0000000100000001590  0000111
20000010019592ALASKA KOOLVISFILET                               0000001270000024003  0000111
20000010022614PAPRIKA 3 ST                                      0000000460000005934  0000111
1S6290615260715DUTCH-ALDI          BERGEN NH, JAN OLDENBURGLAAN                       00002962888
20000000000404BLEEKMIDDEL                                       0000000900000003150  0000222
20000000005197FRUIT                                             0000000430000005977  0000222
20000000006013ROOIBOSTHEE                                       0000000140000001246  0000222
1S6290615260715DUTCH-ALDI          DWINGELOO, HEUVELENWEG                             00002963777
20000000006469PITABROODJES                                      0000000610000004209  0000333
20000000007372SCHENKSTROOP                                      0000000210000001869  0000333
20000000007545HUISVUILZAKKEN                                    0000001080000012852  0000333
1S6290615260715DUTCH-ALDI          BARNEVELD, CATHARIJNESTEEG                         00002964666
20000000005197FRUIT + GRANEN BISCUITS                           0000000720000010008  0000444
20000000005209IJSASSORTI MINIMIX                                0000000190000003781  0000444
20000000006013ROOIBOSTHEE                                       0000000210000001869  0000444

I need the break this file into multiple files based on the pattern match. In this file the pattern line begins with 1S6290615260715, based on that I need to create multiple files like

File 1:

1S6290615260715DUTCH-ALDI          ROTTERDAM, EUDOKIAPLEIN 8                          00002961999
20000010019149GRANEN                                            0000000100000001590  0000111
20000010019592ALASKA KOOLVISFILET                               0000001270000024003  0000111
20000010022614PAPRIKA 3 ST                                      0000000460000005934  0000111

File 2

1S6290615260715DUTCH-ALDI          BERGEN NH, JAN OLDENBURGLAAN                       00002962888
20000000000404BLEEKMIDDEL                                       0000000900000003150  0000222
20000000005197FRUIT                                             0000000430000005977  0000222
20000000006013ROOIBOSTHEE                                       0000000140000001246  0000222

and so on.

Using awk i tried this command

awk '/^1S/f++ {print $0 > "file"f}' input.txt

with this each file is created with single line.

Please suggest the faster processing way either with sed or awk, because I need to do this for very larger files like 15GB to 20GB and provide these split files to hadoop framework for further processing.

Upvotes: 2

Views: 525

Answers (1)

anubhava
anubhava

Reputation: 785246

You can use this awk:

awk '/^1S/{if (f) close(f); f = "file" ++i} {print > f}' file

Upvotes: 2

Related Questions