Reputation: 37
I already saw this answer : How to get parentheses inside parentheses but it didn't really work if I don't know the number of levels of those parentheses.
I'm actually trying to deobfuscate a js file with python, and I have this kind of string that I want to "scrape" :
String.fromCharCode
(
(010 * 12 + 6),
(06 * (0x1 * (1 * 0xa + 6) + 1) + 12),
(4 * 27 + 3),
(01 * 0x3b + 50),
(1 * 0x34 + 15),
(1 * (1 * (3 * ((0x1 * 8 + 7) * 1 + 0) + 8) + 24) + 27),
(0x1 * (2 * 0x25 + 7) + 16),
(1 * 0112 + 40),
(1 * 0x2c + 23),
(0x3 * 042 + 9),
(1 * ((05 * 4 + 1) * 03 + 0) + 37),
(0x2 * (1 * 0x1f + 4) + 31)
)
When I run : re.findall(r"String.fromCharCode\((.+?)\)", content)
it returns me String.fromCharCode((03 * (07 * 4 + 3)
at first.
So it seems like my line of code is only searching for the first occurrence of a closed parenthesis. I didn't try the answer of the above link but it seems like to not be "infinite", we should know beforehand the number of levels.
And what I want to get is the whole parenthesis like that : ((010 * 12 + 6),(06 * (0x1 * (1 * 0xa + 6) + 1) + 12),(4 * 27 + 3),(01 * 0x3b + 50),(1 * 0x34 + 15),(1 * (1 * (3 * ((0x1 * 8 + 7) * 1 + 0) + 8) + 24) + 27),(0x1 * (2 * 0x25 + 7) + 16),(1 * 0112 + 40),(1 * 0x2c + 23),(0x3 * 042 + 9),(1 * ((05 * 4 + 1) * 03 + 0) + 37),(0x2 * (1 * 0x1f + 4) + 31))
EDIT:
To clarify, the code have many other occurrence of the "String.fromCharCode
" that is above. If I were to delete the ?
in the regex code, it will retrieve the entire code.
EDIT2:
I've made a thing : https://pastebin.com/BVtD8R51 It seems to work.
Upvotes: 1
Views: 125
Reputation: 43169
I wonder if this is really the right way to tackle the problem but you might get along with a recursive approach and the newer regex
module:
String\.fromCharCode[^()]*
(
\(
(?:[^()]|(?1))*
\)
)
Python
could be:
import regex as re
rx = re.compile(r'''
String\.fromCharCode[^()]*
(
\(
(?:[^()]|(?1))*
\)
)
''', re.VERBOSE)
for snippet in rx.finditer(your_string_here):
print(snippet.group(0))
Upvotes: 2
Reputation: 776
The +
qualifier in python is greedy by default, so it will match as much as it can. You've added a ?
after it which makes it non-greedy. Take the ?
out and it should match all the way to the next close bracket, but it will also match further than that if it can find other close brackets further into your input, even if that's outside the fromCharCode
close bracket.
Upvotes: 0