Reputation: 301
Context: I have to split an email with several customers’ reservations details that is received every day, with a set of rules. This is an example of the email:
A N K U N F T 11.08.15
*** NEUBUCHUNG ***
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F882129 dsdsaidsaia
F882129 xxxyxyagydaysd
sadsdsdsdsadsadadssda
sadsdsdsdsadsadadssda
**«CUT HERE2»**
A N K U N F T 18.08.15
*** NEUBUCHUNG ***
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F881554 ZXCXZCXCXZCCXZ
F881554 xcvcxvcxvcvxc
F881554 xvcxvcxcvxxvccvxxcv
**«CUT HERE»**
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F881605 xczxcdfsfdsdfs
F881605 zxccxzxzdffdsfds
**«CUT HERE»**
So it basically has to be cut whenever the last F999999 appears (where 9 can be any digit), because F999999 is the reservation code.* I inserted the text: «CUT HERE» just to better understand where to cut.
*NOTE: reservation code may have the following formats: F999999, A999999, E999999 or 999999.
So I apply a working preg_split with the following regex:
Regex1 = "/(?:\\s(F|A|E)?\\d{6}\\s?+.*?\r\n\\s?\r\n)\\K//ms";
However sometimes I have to cut where «CUT HERE2» appears, because sometimes there is some text after the reservation code delimiter.
So I created this regex:
Regex2 = "/^\h*(F|A|E)?\d{6}.*?\R{2}\K/ms"
Yet, I sometimes have this format (newlines between, F999999 of the same reservation), making my previous regex (regex2) cut where it says «NOT CUT HERE»:
A N K U N F T 11.08.15
*** NEUBUCHUNG ***
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F882129 dsdsaidsaia
<<NOT CUT HERE>>
F882129 xxxyxyagydaysd
sadsdsdsdsadsadadssda
sadsdsdsdsadsadadssda
**«CUT HERE»**
A N K U N F T 18.08.15
*** NEUBUCHUNG ***
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F881554 ZXCXZCXCXZCCXZ
<<NOT CUT HERE>>
F881554 xcvcxvcxvcvxc
F881554 xvcxvcxcvxxvccvxxcv
**«CUT HERE»**
11.08.15 xxx xxx X3 2830 14:25 17:50
18.08.15 xxx xxx X3 2831 18:40
F881605 xczxcdfsfdsdfs
F881605 zxccxzxzdffdsfds
**«CUT HERE»**
I just want it to cut where «CUT HERE» appears.
This error happens for example:
***NEUBUCHUNG ***
23.02.17 DUS FNC DE 1414 12:05 15:10
09.03.17 FNC DUS DE 1415 16:40
FNC011 Enotel Baia 9360-215 Ponta do Sol
1 DZ Typ I Meerblick 2Erw. Frühstück
am 03.10.16 CRS: MX - PNR: 1290689
Fluggeber: Condor Flugdienst / PNR: 1290689 Frühbucher 10% inkl. Reiseleitung und Transfer ab/bis
A025808 HERR Berg, Ulrich 62
<<NOT CUT HERE>
Anfrage.
A025808 FRAU Berghaus, Petra 58
**«CUT HERE»**
***S T O R N O **
04.10.16 STR X3 2810
11.10.16 FNC STR X3 2811 18:15
FNC036 The Flame Tree Funchal
1 DZ Meerblick 2Erw. H
A987025 FRAU BURG, GERTRUD *** STORNO *** O
<<NOT CUT HERE>>
A987025 HERR BURG, WALTER *** STORNO *** O
**«CUT HERE»**
***ÄNDERUNG ***
NEU:01.11.16 FRA X3 2806 13:35 16:50
08.11.16 FNC FRA X3 2807 17:40
FNC813 Golden Residence/Wanderk. 9000-105 Funchal
1 Suite seitl. Meerblick 3Erw. F
A982512 FRAU KROST, SIMONE
Frühbucher 15%
<<NOT CUT HERE>>
inkl. Reiseleitung
und Transfer ab/bis
Im Reisepreis bereits enthalten: Drei
geführte Wanderungen (1 Ganztags- und 2
Halbtagswanderungen) inkl. aller
Transfers.
**«SHOULD CUT HERE»**
***ÄNDERUNG ***
ALT:01.11.16 FRA X3 2806 13:35 16:50
08.11.16 FNC FRA X3 2807 17:40
FNC813 Golden Residence/Wanderk. 9000-105 Funchal
1 Suite seitl. Meerblick 3Erw. F
A982512 HERR KROST, SIMONE
**«CUT HERE»**
25.04.17 DRS FNC ST 1602 13:25 17:15
09.05.17 FNC DRS ST 1607 00:00
FNC076 Baia Azul 9004-530 Funchal
1 DZ Typ I Meerblick 2Erw. Halbpension
am 03.10.16 CRS: MX - PNR: 15326821
Fluggeber: alltours / PNR: 15326821
inkl. Reiseleitung
und Transfer ab/bis Flughafen
A025986 HERR Schulze, Steffen 55
A025986 FRAU Schulze, Kerstin 54
**«CUT HERE»**
***S T O R N O **
13.11.16 FRA X3 2806
20.11.16 FNC FRA X3 2807 17:35
FNC096 Pestana Village & Miramar Funchal
1 Studio 2Erw. H
A976918 FRAU HEBING, BETTINA *** STORNO *** O
<<NOT CUT HERE>>
A976918 HERR HEBING, LUDGER *** STORNO *** O
**«CUT HERE»**
I put «NOT CUT HERE» where it splits but shouldn’t. I put: «SHOULD CUT HERE» where it should cut. And i put «CUT HERE» were it cuts correctly.
Upvotes: 1
Views: 122
Reputation: 626870
You may use
'~^\h*F\d{6}.*?\R{2}\K~sm'
See the regex demo
Details:
^
- start of a line\h*
- 0+ horizontal whitespacesF\d{6}
- F
+ 6 digits
-.*?
- any 0+ chars up to the first\R{2}
- 2 linebreaks\K
- and omit the whole match text.See PHP demo:
$re = '~^\h*F\d{6}.*?\R{2}\K~ms';
$str = "A N K U N F T 11.08.15\n*** NEUBUCHUNG ***\n 11.08.15 xxx xxx X3 2830 14:25 17:50\n 18.08.15 xxx xxx X3 2831 18:40\n F882129 dsdsaidsaia\n F882129 xxxyxyagydaysd\nsadsdsdsdsadsadadssda\nsadsdsdsdsadsadadssda\n\nA N K U N F T 18.08.15\n*** NEUBUCHUNG ***\n 11.08.15 xxx xxx X3 2830 14:25 17:50\n 18.08.15 xxx xxx X3 2831 18:40\n F881554 ZXCXZCXCXZCCXZ\n F881554 xcvcxvcxvcvxc\n F881554 xvcxvcxcvxxvccvxxcv\n\n\n11.08.15 xxx xxx X3 2830 14:25 17:50\n 18.08.15 xxx xxx X3 2831 18:40\n F881605 xczxcdfsfdsdfs\n F881605 zxccxzxzdffdsfds\n\n";
print_r(preg_split($re, $str));
Upvotes: 1