Laura
Laura

Reputation: 47

How to get the specific content in Python with BeautifulSoup?

I'm new to Python and I'm coding a little scraper in Python with BeautifulSoup, in order to get the address from the webpage. I have attached the picture of it enter image description here

    </div>
    </div>
    <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St  3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St  3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">

I got the full content by using BeautifulSoup but I don't know how to extract the content of the "full_address". I saw it's in "div" but I don't know what to do next.

links = soup.find_all('div')

Thanks a lot!

Upvotes: 2

Views: 1055

Answers (1)

coder
coder

Reputation: 12972

You can use json to parse the data:

#!/usr/bin/env python 

from bs4 import BeautifulSoup
import json

data = '''
</div>
    </div>
    <div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St  3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St  3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
'''

soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
    info = json.loads(i.get('data-payload'))
    for i in info['props']['locations']:
        print i['address']

Upvotes: 2

Related Questions