jackcogdill
jackcogdill

Reputation: 5122

Using Multiline Regular Expressions in Python?

I am using regular expressions in Python to search through a page source, and find all the json information in the javascript. Specifically an example would look something like this:

var fooData = {
    id: 123456789,
    name : "foo bar",
    country_name: "foo",
    country_is_eu: null,
    foo_bars: null,
    foo_email: null,
    foo_rate: 1.0,
    foo_id: 0987654321
};

I'm fairly new to understanding all there is to know about regular expressions, and I'm not sure if what I'm doing is correct. I can get some individual lines, but I'm not completely sure of how to use re.MULTILINE. This is the code I have so right now:

prog = re.compile('[var ]?\w+ ?= ?{[^.*]+\n};', re.MULTILINE)
vars = prog.findall(text)

Why is this not working?

To be more clear, I really need it to match everything in between these brackets like this:

var fooData = {

};

So, essentially I can't figure out a way to match every line except one that looks like this:

};

Upvotes: 3

Views: 867

Answers (3)

user1006989
user1006989

Reputation:

This is what you are looking for not including the brackets:

(?<=var fooData = {)[^}]+(?=};)

Upvotes: 2

jackcogdill
jackcogdill

Reputation: 5122

I got it! Turns out multiline mode was not even needed, I just matched all lines that didn't end in a ; in between the brackets. I also slightly modified the regex for finding the brackets and such, here is my code:

re.findall('(?:var )?\w+[ ]?=[ ]?{\n(?:.+(?!(?<=;))\n)+};', text)

Thanks to X.Jacobs, I simplified (and fixed) my code to this:

re.findall('(?:var )?\w+\s*=\s*{[^;]+};', text)

Upvotes: 0

Alex W
Alex W

Reputation: 38253

When you're not sure, always consult the documentation (it's quite good for Python).

The multi-line mode makes regular expressions beginning with a caret (^) and ending with a ($) to match the beginning and end of each respective line (where a "line" is whatever immediately follows a newline character \n).

It looks like you are already accounting for this by having \ns at the beginning and end of your regex and you are using the findall() function.

Upvotes: 0

Related Questions