Reputation: 5122
I am using regular expressions in Python to search through a page source, and find all the json information in the javascript. Specifically an example would look something like this:
var fooData = { id: 123456789, name : "foo bar", country_name: "foo", country_is_eu: null, foo_bars: null, foo_email: null, foo_rate: 1.0, foo_id: 0987654321 };
I'm fairly new to understanding all there is to know about regular expressions, and I'm not sure if what I'm doing is correct. I can get some individual lines, but I'm not completely sure of how to use re.MULTILINE. This is the code I have so right now:
prog = re.compile('[var ]?\w+ ?= ?{[^.*]+\n};', re.MULTILINE)
vars = prog.findall(text)
Why is this not working?
To be more clear, I really need it to match everything in between these brackets like this:
var fooData = { };
So, essentially I can't figure out a way to match every line except one that looks like this:
};
Upvotes: 3
Views: 867
Reputation:
This is what you are looking for not including the brackets:
(?<=var fooData = {)[^}]+(?=};)
Upvotes: 2
Reputation: 5122
I got it! Turns out multiline mode was not even needed, I just matched all lines that didn't end in a ;
in between the brackets. I also slightly modified the regex for finding the brackets and such, here is my code:
re.findall('(?:var )?\w+[ ]?=[ ]?{\n(?:.+(?!(?<=;))\n)+};', text)
Thanks to X.Jacobs, I simplified (and fixed) my code to this:
re.findall('(?:var )?\w+\s*=\s*{[^;]+};', text)
Upvotes: 0
Reputation: 38253
When you're not sure, always consult the documentation (it's quite good for Python).
The multi-line mode makes regular expressions beginning with a caret (^) and ending with a ($) to match the beginning and end of each respective line (where a "line" is whatever immediately follows a newline character \n
).
It looks like you are already accounting for this by having \n
s at the beginning and end of your regex and you are using the findall()
function.
Upvotes: 0