anonymous
anonymous

Reputation: 33

scraping webpage data using beautifulsoup

I have tried to scrape text details of a store location and write them to a csv using BeautifulSoup. 2 stores in Alabama are in class LocationSecContent and 17 stores in Arizona are in another class LocationSecContent. In Georgia, 1st store Airport is in single class called location inside the class LocationSecContent and the rest 4 in Georgia are in another class location inside LocationSecContent. I would like to scrape text details and write the store details like name,location,street,phone,fax,hourscontent and all details into a csv file. I'm using firebug in firefox. Sorry, if there are any mistakes, I'm a beginner to beautifulsoup.

here is what i have tried:

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.text, 'html.parser')
d={}
for table in soup.find_all("div", {"class":"content freshvites-location"}):
    table
for col in table.find_all("td"):

    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    Location=col.find_all("div",{'class':'location'})


dt="LocationSecHdr:%s,Location: %s" %(LocationSecHdr, Location)
zx=BeautifulSoup(dt, "html.parser")

print zx.get_text()

I'm not able to iterate through rows and scrape the text.

Method 2:

from bs4 import BeautifulSoup

import requests


page = requests.get('http://freshvites.com/store-locator/')
#print page


soup = BeautifulSoup(page.text, 'html.parser')
#print soup.find_all('a')

for table in soup.find_all("div",{'class':'content freshvites-location'}):
    table


LocationSecHdr=''
LocationSecContent=''
Location=''
LocationTitle=''
Phone=''
Fax=''
HoursTitle=''
HoursContent=''


for col in table.find_all("td"):      
    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    #LocationSecContent= col.find_all("div",{'class':'LocactionSecContent'})
    #Location= col.find_all("div",{'class':'location'})
    LocationTitle= col.find_all("div",{'class':'locationTitle'})
    Phone= col.find_all("div",{'class':'Phone'})
    Fax= col.find_all("div",{'class':'Fax'})
    HoursContent=col.find_all("div",{'class':'HoursContent'})

    data="LocationSecHdr: %s, LocationSecContent: %s, Location:%s, LocationTitle : %s, Phone:%s, Fax :%s, HoursContent:%s " %(LocationSecHdr, LocationSecContent, Location, LocationTitle, Phone, Fax, HoursContent)
    zax=BeautifulSoup(data,"html.parser")

print zax.get_text()

If I try this code, i can't get the address of the store and I don't know how to get these details as a dict too

Upvotes: 0

Views: 496

Answers (1)

egonr
egonr

Reputation: 982

I think I have enough information now to answer your question.

You are looking for the wrong tag/class combination. All informations for a location are contained inside of a <div class="location">. Here is a sample:

<div class="location">
<div class="locationTitle">32nd Street &amp; Thunderbird</div>
Fresh Vitamins<br> 
13802 N. 32nd St #11<br> 
Phoenix, AZ 85032<br>
<div class="Phone">&nbsp;</div>
<div class="Fax">877.935.6902</div>
<div class="HoursTitle">Hours:</div>
<div class="HoursContent">9am - 7pm M-F<br> 9am - 6pm Sat<br> 11am - 4pm Sun</div>
</div>

As you can see in the sample there is no <tr> or <td> so looking for that doesn't really make sense.

Here's a short python script I put together to find all locations:

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.content, 'html.parser')

for div in soup.find_all("div", {"class":"location"}):
    print(div)

Now you just need to filter the information you need from div. Everything you need for that should be easy to find on so.

Upvotes: 1

Related Questions