Mike S.
Mike S.

Reputation: 215

re.compile regex assistance (python, beautifulsoup)


Using this code from a different thread

import re
import requests
from bs4 import BeautifulSoup

data = """
<script type="text/javascript">
    window._propertyData = 
    { *** a lot of random code and other data ***
    "property": {"street": "21st Street", "apartment": "2101", "available": false}
    *** more data ***
    }
</script>
"""

soup = BeautifulSoup(data, "xml")
pattern = re.compile(r'\"street\":\s*\"(.*?)\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

This gets me the desired result:

21st Street

However, i was trying to get the whole thing by trying different variations of the regex and couldn't achieve the output to be:

{"street": "21st Street", "apartment": "2101", "available": false}

I have tried the following:

pattern = re.compile(r'\"property\":\s*\"(.*?)\{\"', re.MULTILINE | re.DOTALL)

Its not producing the desired result.
Your help is appreciated!
Thanks.

Upvotes: 3

Views: 544

Answers (4)

Pavneet_Singh
Pavneet_Singh

Reputation: 37404

As per commented above , correct your typo and you use this

 r"property\W+({.*?})"

RegexDemo

property : look for exact string

\W+ : matches any non-word character

({.*?}) : capture group one

  • .* matches any character inside braces {}
  • ? matches as few times as possible

Upvotes: 1

宏杰李
宏杰李

Reputation: 12168

import re
import ast
data = """
<script type="text/javascript">
    window._propertyData =
    { *** a lot of random code and other data ***
    "property": {"street": "21st Street", "apartment": "2101", "available": false}
    *** more data ***
    }
</script>
"""
property = re.search(r'"property": ({.+?})', data)
str_form = property.group(1)
print('str_form: ' + str_form)
dict_form = ast.literal_eval(str_form.replace('false', 'False'))
print('dict_form: ', dict_form)

out:

str_form: {"street": "21st Street", "apartment": "2101", "available": false}
dict_form:  {'available': False, 'street': '21st Street', 'apartment': '2101'}

Upvotes: 0

Vijay Wilson
Vijay Wilson

Reputation: 516

Try this, It may be long but work's fine

\"property\"\:\s*(\{((?:\"\w+\"\:\s*\"?[\w\s]+\"?\,?\s?)+?)\})

https://regex101.com/r/7KzzRV/3

Upvotes: 0

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10476

You can try this:

\"property\":\s*(\{.*?\})

capture group 1 contains yor desired data

Explanation

Sample Code:

import re

regex = r"\"property\":\s*(\{.*?\})"

test_str = ("window._propertyData = \n"
    "    { *** a lot of random code and other data ***\n"
    "    \"property\": {\"street\": \"21st Street\", \"apartment\": \"2101\", \"available\": false}\n"
    "    *** more data ***\n"
    "    }")

matches = re.finditer(regex, test_str, re.MULTILINE | re.DOTALL)

for matchNum, match in enumerate(matches):
   print(match.group(1))

Run it here

Upvotes: 0

Related Questions