Luciano Closs
Luciano Closs

Reputation: 38

Python regex findall did not respond

I already did a research and find out about catastrophic backtracking, but I can't figure out if it is the case.

I have a small script:

import re

if __name__ == '__main__':
    name = 'vuejs-complete-guide-vue-course.vue.test'
    print( name )
    extractedDomain = re.findall(r'([A-Za-z0-9\-\_]+){1,63}.([A-Za-z0-9\-\_]+){1,63}$', name)
    print( extractedDomain )

This regex does not finalize and I don't understand why.

But if the name be:

    name = 'vue-course.vue.test'

Then it works.

Someone can help me?

Upvotes: 0

Views: 50

Answers (1)

The fourth bird
The fourth bird

Reputation: 163362

The issue is catastrophic backtracking due to the nested quantifiers (the quantifier + for the character class and the outer group {1,63})

Your string contains a dot, which can only be matched by the . in your pattern (as the . can match any character)

As your string contains 2 dots which it can not match, it will still try to explore all the paths.

Ending for example the string on a dot like vuejs-complete. can also become problematic as there should be at least a single char other than a dot following.


Looking at the pattern that you tried and the example string, you can repeat the character class 1-63 times, followed by repeating a group 1 or more times starting with a dot.

Note to escape the dot to match it literally.

^[A-Za-z0-9_-]{1,63}(?:\.[A-Za-z0-9_-]{1,63})+$

Explanation

  • ^ Start ofs tring
  • [A-Za-z0-9_-]{1,63} Repeat the character class 1-63 times
  • (?: Non capture group to repeat as a whole part
    • \.[A-Za-z0-9_-]{1,63} Match . and repeat the character class 1-63 times
  • )+ Close the group and repeat 1+ times
  • $ End of string

Regex demo

Upvotes: 2

Related Questions