Naveen
Naveen

Reputation: 852

How can i use regex to parse js class names along with the code wrapped inside it?

I am trying to parse class name and the contents wrapped inside a js class with regex. I am using python to do the parsing. Here is the example code which i am trying to parse. What i expect to get by matching with regex is the list of class names and all the contents inside the class ( all methods, variables )

class Rectangle {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}

class Square {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}

i wrote this pattern to match the above code

class\s(.*)\{(.*)\}

But it matches differently as seen in the picture

enter image description here

As far as i know the regex which is supposed to stop at first curly brace, stopped at the second class curly brace, what am i doing wrong, what is the correct way to solve this problem?

Upvotes: 0

Views: 105

Answers (2)

Toto
Toto

Reputation: 91488

Use the Pypi regex library that supports recursive regex, this will work for any number of sub-blocks:

import regex

strin = '''
class Rectangle {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}

class Square {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}
'''
res = regex.findall(r'(class\s+\w+\s+({(?:[^{}]+|(?2))*}))', strin)
print res[0][0]
print '----------------------------------------'
print res[1][0]

Output:

class Rectangle {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}
----------------------------------------
class Square {
  constructor(height, width) {
    this.height = height;
    this.width = width;
  }
}

Demo & explanation (using PCRE because regex101 doesn't use regex module)

Upvotes: 2

George D.
George D.

Reputation: 265

TL;DR: class\s+(.+?)\{(.+?)\n\} should do the trick

There are two issues with your attempted solution

  1. You are using a greedy quantifier ((.*)\{) for the class name cature group, where a lazy quantifier is needed ((.+?)\{). This is causing the capture group to spill over until the final occurrence of \{.
  2. You also need a lazy quantifier ((.+?)\n\}) to dictate when the capture group for the class body ends. This will only work for formatted code where a class clearly ends in \n, since all other instances of } will be preceded by indentation. I don't believe it's possible to create a regex that can separate a class body in the general case, unfortunately.

Edit: I also replaced your *s with +ses where I think it's appropriate in order to assert that some character must appear in the class name and in the class body

Upvotes: 2

Related Questions