Reputation: 67
\(Uni(t|ts)|Sho(p|ps)|Offic(e|es)|Fla(t|ts)?|Rm|Roo(m|ms)|Suit(e|es)).\w+(-|&)?\w*\gi
I am using the above pattern to find the matches, but there are 2 cases that the regex didn't find, how can I find "Rm. 2301" and "Flat/Room 5" in editing my regex?
use case in online editor https://regex101.com/r/Sc1Feg/4
unmatched cases
Rm. 2301, Blk. B3-B4,
Flat/Room 5, 9/F,
matched cases
rm A, 17/F.,
Flat F, 9/F,
Flat G1, 10/F,
Flat C, 36/F, Block 1,
Flat 1107&1108, 11/F,
Flat 2301, 23/F, F
Unit 3, 2/F, L
Unit 1603 16/F
Offices D-F, 23/F,
Office D-F, 23/F,
Unit 1901, 19/F,
Units A, 6/F,
Shop 14, G/F,
Rooms 2202,
Suite 702, 7/F.,
Upvotes: 1
Views: 176
Reputation: 13376
... how about something more generic, yet still specific enough, like ... /^[\w./]+\s+[\w&-]+/
..?
After all there is generic pattern, two whitespace-separated character-sequences, which can be matched as follows ...
^[\w./]+\s+
... right from the start anything that matches at least either a word a dot or a slash character until one reaches an including whitespace (sequence) ...[\w&-]+
... the matching then does continuo with anything which is at least either a word or an ampersand or a minus character.console.log(
`Rm. 2301, Blk. B3-B4,
Flat/Room 5, 9/F,
rm A, 17/F.,
Flat F, 9/F,
Flat G1, 10/F,
Flat C, 36/F, Block 1,
Flat 1107&1108, 11/F,
Flat 2301, 23/F, F
Unit 3, 2/F, L
Unit 1603 16/F
Offices D-F, 23/F,
Office D-F, 23/F,
Unit 1901, 19/F,
Units A, 6/F,
Shop 14, G/F,
Rooms 2202,
Suite 702, 7/F.,`.match(/^[\w./]+\s+[\w&-]+/gm));
.as-console-wrapper { min-height: 100%!important; top: 0; }
Upvotes: 0
Reputation: 163362
About the pattern
This part Rm. 2301, Blk. B3-B4,
does not match because the last part of the pattern that you tried has .\w+
The pattern matches Rm
in the alternation, the .
in the pattern can also match the dot in the string but then there is a space after Rm.
which will not be matched by the following \w
.
In this part Flat/Room 5
the space and 5 are not matched due to the same mechanism.
Flat
is matched in the alternation, the .
in the pattern matches /
and the \w+
matches Room
, but this part (-|&)?\w*
in the pattern does not match the space after it.
As all the example strings match till before the comma, one option is to match all that follows except a comma.
It is a broader match, but it might prevent creating a more complex pattern to account for all the variations.
\b(?:Units?|Shops?|Offices?|Flats?|Rm|Rooms?|Suites?)[^,\r\n]+
Note that you can change part like this (p|ps)
into matching a p
followed by an optional s
and remove the group.
Upvotes: 1
Reputation: 8124
A perhaps more readable regex would be this:
/(Flat\/Room|Flat|Suite|Rooms|Rm\.|Rm|Shop|Units|Unit|Offices|Office) ([\w&-]+)/gmi
And it catches the room type inside the 1st group and the number next to it (and before the comma) in the group 2.
Explanation:
(A|B|C)
: Will capture in group 1 either A
, B
or C
.([\w&-]+)
: Will capture in group 2 an alphanumeric or &
or -
, any number of times.Demo: https://regex101.com/r/Sc1Feg/5
Upvotes: 0
Reputation: 3891
Since Rm. 2301
contains a period and a space, this portion of the regex will not match:
.\w
To fix it, you can use the plus operator (+
) so that it will match the period and the space. To prevent it from expanding the capture to the end of the line, you can also use the lazy operator (?
).
.+?\w
So the final regex would be:
/(Uni(t|ts)|Sho(p|ps)|Offic(e|es)|Fla(t|ts)?|Rm|Roo(m|ms)|Suit(e|es)).+?\w+(-|&)?\w*/
Upvotes: 0