Zam89
Zam89

Reputation: 143

RegEx split match by keyword

I have this text:

     CTN1: CAIU3201968
     order No. 1900958
     palisade, all sides middle picked, G603
1000755   10     25       150     16 pcs/pallet    4 pallets      64,00 pce        77,25     4.944,00
     palisade, all sides middle picked, G603
1000753   10     25       100     20 pcs/pallet   16 pallets      320,00 pce       51,50    16.480,00
     CTN2: BSIU3070499
     order No. 1900958
     palisade, all sides middle picked, G603
1007780   10     25       125     18 pcs/pallet    4 pallets      72,00 pce        64,38     4.635,00
     palisade, all sides middle picked, G603
1000751   10     25          60   40 pcs/pallet    2 pallets      80,00 pce        30,90     2.472,00
     palisade, all sides middle picked, G603
1000752   10     25          80   24 pcs/pallet    5 pallets      120,00 pce       41,20     4.944,00
     palisade, all sides middle picked, G603
1000753   10     25       100     20 pcs/pallet    3 pallets      60,00 pce        51,50     3.090,00
     palisade, all sides middle picked, G603
1001526    8     20       100     36 pcs/pallet    5 pallets      180,00 pce       37,00     6.660,00
     CTN3: NYKU3708986
     order No. 1900958
     palisade, all sides middle picked, G603
1000751   10     25          60   40 pcs/pallet    9 pallets      360,00 pce       30,90    11.124,00
     palisade, all sides middle picked, G603
1002452   10     25          75   24 pcs/pallet   11 pallets      264,00 pce       38,63    10.197,00

I need one RegEx match group per CTN, so Group one is:

    CTN1: CAIU3201968
     order No. 1900958
     palisade, all sides middle picked, G603
    1000755   10     25       150     16 pcs/pallet    4 pallets      64,00 pce        77,25     4.944,00
     palisade, all sides middle picked, G603
    1000753   10     25       100     20 pcs/pallet   16 pallets      320,00 pce       51,50    16.480,00

Group two:

CTN2: BSIU3070499
     order No. 1900958
     palisade, all sides middle picked, G603
1007780   10     25       125     18 pcs/pallet    4 pallets      72,00 pce        64,38     4.635,00
     palisade, all sides middle picked, G603
1000751   10     25          60   40 pcs/pallet    2 pallets      80,00 pce        30,90     2.472,00
     palisade, all sides middle picked, G603
1000752   10     25          80   24 pcs/pallet    5 pallets      120,00 pce       41,20     4.944,00
     palisade, all sides middle picked, G603
1000753   10     25       100     20 pcs/pallet    3 pallets      60,00 pce        51,50     3.090,00
     palisade, all sides middle picked, G603
1001526    8     20       100     36 pcs/pallet    5 pallets      180,00 pce       37,00     6.660,00

and so on.

What I habe try is this Regex:

CTN\d{1,2}:((.|\n)*)CTN\d{1,2}:

But with this I got only one group with everything and excluding the last group.

Upvotes: 1

Views: 48

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

As an alternative of using [\S\s]*? you could use a negative lookahead (?! to match all the lines that do not start with the CTN pattern to limit the amount of backtracking.

^[^\S\r\n]*CTN\d{1,2}:.*(?:\r?\n(?![^\S\r\n]*CTN\d).+)*
  • ^ Start of string
  • [^\S\r\n]* Match 0+ whitespace chars except a newline
  • CTN\d{1,2}: Match CTN, 1 or 2 digits and a colon
  • .* Match any char except a newline 0+ times
  • (?: Non capturing group
    • \r?\n Match a newline
    • (?![^\S\r\n]*CTN\d) Negative lookahead, assert that the lines does not start with CTN\d
    • .+ Match any char except a newline 1+ times
  • )* Close non capturing group and repeat the group 0+ times

If there should be at least 1 line after CTN, you could repeat the last part 1+ times using )+

Regex demo

Upvotes: 2

CAustin
CAustin

Reputation: 4614

Give this a try:

\s*CTN\d{1,2}:[\S\s]*?(?=\s+CTN|\Z)

https://regex101.com/r/ElfRn6/1

Upvotes: 2

Related Questions