user7489400
user7489400

Reputation:

Where is the error leading to the infinite loop in this RegExp?

Here is the call in JS:

'Москва, Щёлковское шоссе д.3 строение 1 - Торговый Центр Город Хобби - 3 этаж, павильон 310, 105122'.match(/(?:[а-яА-ЯёЁ0-9\-\.\(\)]+\s?)+,?\s?(?:[а-яА-ЯёЁ0-9\-\.\(\)]+\s?)+(?:г|гор|город|п|пос|поселок|д|дер|деревня|ул|улица|пр|просп|пр-т|проспект|п|пл|площадь|ал|аллея|бул|б-р|дор|наб|пер|пр|пр-д|туп|ш|шос|шоссе|к|кор|корп|корпус|стр|строение|зд|зд-е|здание|вор|ворота|д|дом|секция)\.?\s?,?\s?(?:г|гор|город|п|пос|поселок|д|дер|деревня|ул|улица|пр|просп|пр-т|проспект|п|пл|площадь|ал|аллея|бул|б-р|дор|наб|пер|пр|пр-д|туп|ш|шос|шоссе|к|кор|корп|корпус|стр|строение|зд|зд-е|здание|вор|ворота|д|дом|секция)\.?\s?[а-яА-Я0-9ёЁ№\-\/()]+,?\s?(?:г|гор|город|п|пос|поселок|д|дер|деревня|ул|улица|пр|просп|пр-т|проспект|п|пл|площадь|ал|аллея|бул|б-р|дор|наб|пер|пр|пр-д|туп|ш|шос|шоссе|к|кор|корп|корпус|стр|строение|зд|зд-е|здание|вор|ворота|д|дом|секция)\.?\s?[а-яА-Я0-9ёЁ№\-\/()]+/gi)

It leads to infinite call, so console will be stuck. I think here is something wrong, but can't find this error.

Also, I think it's not an error in V8, so guys, look at this pls. Looks like the bottleneck is the last part [а-яА-Я0-9ёЁ№\-\/()]+ that leads to searching for the matches in loop.

How can I fix it?

Upvotes: 0

Views: 64

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336478

This problem is called catastrophic backtracking, resulting from nested quantifiers:

Let's reduce this part

(?:[а-яА-ЯёЁ0-9\-\.\(\)]+\s?)+

to something more readable:

(?:[0-9]+\s?)+

Now imagine a number like 12345. The above regex has several options of matching those:

12345
1234, 5
123, 45
123, 4, 5
12, 345
12, 3, 45
12, 34, 5
12, 3, 4, 5
[...]

and it must try all of these if the regex fails to match initially - after all, maybe a different combination might work that we haven't tried yet.

So, avoid nested quantifiers, or make sure that they don't allow all those permutations, for example by not making the \s optional:

(?:[а-яА-ЯёЁ0-9\-\.\(\)]+\s)+

or by adding the space to your character class:

[а-яА-ЯёЁ0-9\-.()\s]+

making the nesting unnecessary.

Upvotes: 2

Related Questions