user2175693
user2175693

Reputation: 23

Converting regex python to javascript

I am very new to Regex, and I searched a long time for the equivilants in javascript, I would love it is somebody responded with a detailed explanation of the regex in javascript, converted from python.

import re

regex = r"""
    ^(
      (?P<ShowNameA>.*[^ (_.]) # Show name
        [ (_.]+
        ( # Year with possible Season and Episode
          (?P<ShowYearA>\d{4})
          ([ (_.]+S(?P<SeasonA>\d{1,2})E(?P<EpisodeA>\d{1,2}))?
        | # Season and Episode only
          (?<!\d{4}[ (_.])
          S(?P<SeasonB>\d{1,2})E(?P<EpisodeB>\d{1,2})
        | # Alternate format for episode
          (?P<EpisodeC>\d{3})
        )
    |
      # Show name with no other information
      (?P<ShowNameB>.+)
    )
    """

test_str = ("archer.2009.S04E13\n"
    "space 1999 1975\n"
    "Space: 1999 (1975)\n"
    "Space.1999.1975.S01E01\n"
    "space 1999.(1975)\n"
    "The.4400.204.mkv\n"
    "space 1999 (1975)\n"
    "v.2009.S01E13.the.title.avi\n"
    "Teen.wolf.S04E12.HDTV.x264\n"
    "Se7en\n"
    "Se7en.(1995).avi\n"
    "How to train your dragon 2\n"
    "10,000BC (2010)")

matches = re.finditer(regex, test_str, re.MULTILINE | re.VERBOSE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Regex101

Upvotes: 1

Views: 7198

Answers (1)

Jeff Hykin
Jeff Hykin

Reputation: 2627

Sadly there is no easy way to covert Python regex to Javascript regex because Python regex is much more robust than Javascript regex.

Javascript is missing functional things like negative look behinds and recursion, but it misses many more syntactical tools like verbose syntax and named capturing groups.

regular capture group = ()
named capture group = (?P<ThisIsAName>)

verbose regex = 'find me #this regex ignores comments and whitespace'
non verbose regex = 'this treats whitespace literally'

So if we convert your named capture groups to regular (numbered) capture groups
And if we convert the verbose syntax into regular syntax. Then that regex would be valid Javascript regex, which, in Javascript would look like:
regex = /^((.*[^ (_.])[ (_.]+((\d{4})([ (_.]+S(\d{1,2})E(\d{1,2}))?|(?<!\d{4}[ (_.])S(\d{1,2})E(\d{1,2})|(\d{3}))|(.+))/

// group 2 = ShowNameA
// group 4 = ShowYearA
// group 6 = SeasonB
// group 7 = EpisodeC
// group 8 = ShowNameB

As you can see the Javascript version is pretty ugly because it does not have the verbose syntax or named capture groups. However in this case is functional equivalent.

Javascript does not have a direct equivalent to findall so you'll have to make/find an equivalent to that. Here is an article explaining several such ways. https://www.activestate.com/blog/2008/04/javascript-refindall-workalike

In the future I also highly recommend going to regexr.com to learn regex, specifically javascript regex.

Upvotes: 3

Related Questions