pakwai122
pakwai122

Reputation: 67

regular expression in address pattern

\(Uni(t|ts)|Sho(p|ps)|Offic(e|es)|Fla(t|ts)?|Rm|Roo(m|ms)|Suit(e|es)).\w+(-|&)?\w*\gi

I am using the above pattern to find the matches, but there are 2 cases that the regex didn't find, how can I find "Rm. 2301" and "Flat/Room 5" in editing my regex?

use case in online editor https://regex101.com/r/Sc1Feg/4

unmatched cases
Rm. 2301, Blk. B3-B4, 
Flat/Room 5, 9/F, 

matched cases
rm A, 17/F., 
Flat F, 9/F, 
Flat G1, 10/F, 
Flat C, 36/F, Block 1, 
Flat 1107&1108, 11/F, 
Flat 2301, 23/F, F
Unit 3, 2/F, L
Unit 1603 16/F 
Offices D-F, 23/F, 
Office D-F, 23/F, 
Unit 1901, 19/F, 
Units A, 6/F, 
Shop 14, G/F, 
Rooms 2202, 
Suite 702, 7/F.,

Upvotes: 1

Views: 176

Answers (4)

Peter Seliger
Peter Seliger

Reputation: 13376

... how about something more generic, yet still specific enough, like ... /^[\w./]+\s+[\w&-]+/ ..?

After all there is generic pattern, two whitespace-separated character-sequences, which can be matched as follows ...

  • ^[\w./]+\s+ ... right from the start anything that matches at least either a word a dot or a slash character until one reaches an including whitespace (sequence) ...
  • [\w&-]+ ... the matching then does continuo with anything which is at least either a word or an ampersand or a minus character.

console.log(
`Rm. 2301, Blk. B3-B4, 
Flat/Room 5, 9/F, 
rm A, 17/F., 
Flat F, 9/F, 
Flat G1, 10/F, 
Flat C, 36/F, Block 1, 
Flat 1107&1108, 11/F, 
Flat 2301, 23/F, F
Unit 3, 2/F, L
Unit 1603 16/F 
Offices D-F, 23/F, 
Office D-F, 23/F, 
Unit 1901, 19/F, 
Units A, 6/F, 
Shop 14, G/F, 
Rooms 2202, 
Suite 702, 7/F.,`.match(/^[\w./]+\s+[\w&-]+/gm));
.as-console-wrapper { min-height: 100%!important; top: 0; }

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163362

About the pattern

  • This part Rm. 2301, Blk. B3-B4, does not match because the last part of the pattern that you tried has .\w+

    The pattern matches Rm in the alternation, the . in the pattern can also match the dot in the string but then there is a space after Rm. which will not be matched by the following \w.

  • In this part Flat/Room 5 the space and 5 are not matched due to the same mechanism.

    Flat is matched in the alternation, the . in the pattern matches / and the \w+ matches Room, but this part (-|&)?\w* in the pattern does not match the space after it.


As all the example strings match till before the comma, one option is to match all that follows except a comma.

It is a broader match, but it might prevent creating a more complex pattern to account for all the variations.

\b(?:Units?|Shops?|Offices?|Flats?|Rm|Rooms?|Suites?)[^,\r\n]+

Regex demo

Note that you can change part like this (p|ps) into matching a p followed by an optional s and remove the group.

Upvotes: 1

adelriosantiago
adelriosantiago

Reputation: 8124

A perhaps more readable regex would be this:

/(Flat\/Room|Flat|Suite|Rooms|Rm\.|Rm|Shop|Units|Unit|Offices|Office) ([\w&-]+)/gmi

And it catches the room type inside the 1st group and the number next to it (and before the comma) in the group 2.

Explanation:

  • (A|B|C): Will capture in group 1 either A, B or C.
  • ([\w&-]+): Will capture in group 2 an alphanumeric or & or -, any number of times.

Demo: https://regex101.com/r/Sc1Feg/5

Upvotes: 0

Mark Skelton
Mark Skelton

Reputation: 3891

Since Rm. 2301 contains a period and a space, this portion of the regex will not match:

.\w

To fix it, you can use the plus operator (+) so that it will match the period and the space. To prevent it from expanding the capture to the end of the line, you can also use the lazy operator (?).

.+?\w

So the final regex would be:

/(Uni(t|ts)|Sho(p|ps)|Offic(e|es)|Fla(t|ts)?|Rm|Roo(m|ms)|Suit(e|es)).+?\w+(-|&)?\w*/

Upvotes: 0

Related Questions