Amen Aziz
Amen Aziz

Reputation: 779

Extract specific tag tag from the <p>

I want to extract the address from the p tag only such as I want to get these Santa Barbara, CA 93101

[<p class="hide" id="phoneDiv_80863"><i aria-hidden="true" class="fa fa-phone-square"></i> (805) 636-9890</p>, <p>

Santa Barbara, CA 93101



</p>, <p style="margin-top:2em;"><a class="btn btn-default" href="/profile/id/80863/NicoleABotaitis93101" target="_top">View</a> <a class="btn btn-default" href="mailto:[email protected]" id="eml80863" target="_top">Email</a></p>]
[]
[<p class="hide" id="phoneDiv_26092"><i aria-hidden="true" class="fa fa-phone-square"></i> 8058956960</p>, <p>

Santa Barbara, CA 93111

Code

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

limit = 25

url = f'https://www.counselingcalifornia.com/cc/cgi-bin/utilities.dll/customlist?FIRSTNAME=~&LASTNAME=~&ZIP=&DONORCLASSSTT=&_MULTIPLE_INSURANCE=&HASPHOTOFLG=&_MULTIPLE_EMPHASIS=&ETHNIC=&_MULTIPLE_LANGUAGE=ENG&QNAME=THERAPISTLIST&WMT=NONE&WNR=NONE&WHP=therapistHeader.htm&WBP=therapistList.htm&RANGE=1%2F{limit}&SORT=LASTNAME'
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
rows = soup.find_all('div', {'class':'row'})
temp=[]
for row in rows:
    t=row.find_all('div',class_='col-sm-3')
    for i in t:
        u=i.find_all('p')
        print(u)

Upvotes: 0

Views: 304

Answers (2)

Tibebes. M
Tibebes. M

Reputation: 7558

Here is a better solution using css-selector:

import requests
from bs4 import BeautifulSoup
import pandas as pd

limit = 25

url = f'https://www.counselingcalifornia.com/cc/cgi-bin/utilities.dll/customlist?FIRSTNAME=~&LASTNAME=~&ZIP=&DONORCLASSSTT=&_MULTIPLE_INSURANCE=&HASPHOTOFLG=&_MULTIPLE_EMPHASIS=&ETHNIC=&_MULTIPLE_LANGUAGE=ENG&QNAME=THERAPISTLIST&WMT=NONE&WNR=NONE&WHP=therapistHeader.htm&WBP=therapistList.htm&RANGE=1%2F{limit}&SORT=LASTNAME'
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36'}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

for address in soup.select('.col-sm-3>p:nth-child(3)'):
    print(address.text.strip())

Sample Output:

Santa Barbara, CA 93101
Santa Barbara, CA 93111
Santa Barbara, CA 93101
Tustin, CA 92780
Valencia, CA 91355
Pasadena, CA 91105
United States
Walnut Creek, CA 94596
Woodland Hills, CA 91365-0644
Monterey, CA 93940
Granada Hills, CA 91344
United States
Studio City, CA 91604
Santa Rosa, CA 95404
Sonoma
San Dimas, CA 91773
United States
San Francisco, CA 94116
Rancho Mirage, CA 92270
Berkeley, CA 94705-1808
Anderson, CA 96007
Shasta
Mission Viejo, CA 92691
United States
Claremont, CA 91711
Seal Beach, CA 90740
USA
West Covina, CA 91790
Los Angeles
Mission Viejo, CA 92692
Laguna Niguel, CA 92677
Camarillo, CA 93010
West Hills, CA 91308

References:

Upvotes: 2

Rupal Shah
Rupal Shah

Reputation: 339

Is this what you are looking for:

soup = BeautifulSoup(response.text, 'html.parser')
rows = soup.find_all('div', {'class':'row'})
temp=[]
for row in rows:
    t=row.find_all('div',class_='col-sm-3')
    for i in t:
        u=i.find_all('p')[1:2]
        for each_u in u:
            address = each_u.text.split('\n')[1]
            print(address)

Upvotes: 0

Related Questions