Reputation: 852
I am trying to parse class name and the contents wrapped inside a js class with regex. I am using python to do the parsing. Here is the example code which i am trying to parse. What i expect to get by matching with regex is the list of class names and all the contents inside the class ( all methods, variables )
class Rectangle {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
class Square {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
i wrote this pattern to match the above code
class\s(.*)\{(.*)\}
But it matches differently as seen in the picture
As far as i know the regex which is supposed to stop at first curly brace, stopped at the second class curly brace, what am i doing wrong, what is the correct way to solve this problem?
Upvotes: 0
Views: 105
Reputation: 91488
Use the Pypi regex library that supports recursive regex, this will work for any number of sub-blocks:
import regex
strin = '''
class Rectangle {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
class Square {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
'''
res = regex.findall(r'(class\s+\w+\s+({(?:[^{}]+|(?2))*}))', strin)
print res[0][0]
print '----------------------------------------'
print res[1][0]
Output:
class Rectangle {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
----------------------------------------
class Square {
constructor(height, width) {
this.height = height;
this.width = width;
}
}
Demo & explanation (using PCRE because regex101 doesn't use regex
module)
Upvotes: 2
Reputation: 265
TL;DR: class\s+(.+?)\{(.+?)\n\}
should do the trick
There are two issues with your attempted solution
(.*)\{
) for the class name cature group, where a lazy quantifier is needed ((.+?)\{
). This is causing the capture group to spill over until the final occurrence of \{
.(.+?)\n\}
) to dictate when the capture group for the class body ends. This will only work for formatted code where a class clearly ends in \n
, since all other instances of }
will be preceded by indentation. I don't believe it's possible to create a regex that can separate a class body in the general case, unfortunately.Edit: I also replaced your *
s with +
ses where I think it's appropriate in order to assert that some character must appear in the class name and in the class body
Upvotes: 2