scorezel789
scorezel789

Reputation: 163

How to extract all words before a match?

is there any regex that would separate the title and the address to from the text to the output below?

This is what i have so far:

.+?(?=\d+.*Singapore \d{6}\b)

Text:

Marina Bay Sands Relocated! 2 Bayfront Avenue Galleria Level #B1-01 Singapore 018972
+65 6634 9969
nex 23 Serangoon Central #B1-10 Singapore 556083
+65 6634 7787
Northpoint City 1 Northpoint Drive South Wing #B1-107 Singapore 768019
+65 6481 3433

Output:

Marina Bay Sands Relocated! 
2 Bayfront Avenue Galleria Level #B1-01 Singapore 018972

nex 
23 Serangoon Central #B1-10 Singapore 556083

Northpoint City 1 Northpoint Drive South Wing #B1-107 Singapore 768019
+65 6481 3433

Upvotes: 2

Views: 54

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may use

(.+?)\s*(\d+.*Singapore \d{6})\b(?:\r?\n(\+65\s*\d{4}\s*\d{4}))?

Or just

(.+?)\s*(\d+.*Singapore \d{6})\b(?:\r?\n(\+65[\d ]*))?

See the regex demo.

Details

  • (.+?) - Group 1: any 1 or more chars other than linebreak chars, as few as possible
  • \s* - 0+ whitespaces
  • (\d+.*Singapore \d{6}) - Group 2: 1+ digits, any 0+ chars other than line break chars, as many as possible, Singapore and then six digits
  • \b - word boundary
  • (?:\r?\n(\+65\s*\d{4}\s*\d{4}))? - an optional sequence of
    • \r?\n - CRLF or LF line ending
    • (\+65\s*\d{4}\s*\d{4}) - Group 3: +65, 0+ whitespaces, 4 digits, 0+ whitespaces, 4 digits. The [\d ]* will match 0 or more digits or spaces.

Three group contents per match:

Marina Bay Sands Relocated!
2 Bayfront Avenue Galleria Level #B1-01 Singapore 018972
+65 6634 9969

nex
23 Serangoon Central #B1-10 Singapore 556083
+65 6634 7787

Northpoint City
1 Northpoint Drive South Wing #B1-107 Singapore 768019
+65 6481 3433

Upvotes: 1

Related Questions