Iterating through list of URLs in Python - bs4

Question

I have one .txt file (named test_1.txt) that is formatted as follows:

https://maps.googleapis.com/maps/api/directions/xml?origin=Bethesda,MD&destination=Washington,DC&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Miami,FL&destination=Mobile,AL&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Scranton,PA&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Baltimore,MD&destination=Charlotte,NC&sensor=false&mode=walking

If you go to one of the links above you'll see the output in XML. With the code written below, I've managed to get it to iterate through to the second directions request (Miami to Mobile) and it prints seemingly random data that isn't what I want. I also am able to get this working, printing exactly the data I need when just going to one URL at a time with the .txt but directly from the code. Is there any reason it is only going to the second URL and printing the wrong info? Python code is below:

import urllib2
from bs4 import BeautifulSoup

with open('test_1.txt', 'r') as f:
    f.readline()
    mapcalc = f.readline()
    response = urllib2.urlopen(mapcalc)
    soup = BeautifulSoup(response)

for leg in soup.select('route > leg'):
    duration = leg.duration.text.strip()
    distance = leg.distance.text.strip()
    start = leg.start_address.text.strip()
    end = leg.end_address.text.strip()
    print duration
    print distance
    print start
    print end

EDIT:

This is the output of the Python Code in the Shell:

56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA

hmcmurray · Accepted Answer

Here's a link that could shed more light on the behavior you can get when opening files and reading lines, etc. (related to Lev Levitsky's comment).

One way:

import httplib2
from bs4 import BeautifulSoup

http = httplib2.Http()
with open('test_1.txt', 'r') as f:
    for mapcalc in f:
        status, response = http.request(mapcalc)
        for leg in BeautifulSoup(response):
            duration = leg.duration.text.strip()
            distance = leg.distance.text.strip()
            start = leg.start_address.text.strip()
            end = leg.end_address.text.strip()
            print duration
            print distance
            print start
            print end

f.close()

I'm new to this sort of thing but I got the above code to work with the following output:

4877
1 hour 21 mins
6582
4.1 mi
Bethesda, MD, USA
Washington, DC, USA
56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA
190
3 mins
269
0.2 mi
Chicago, IL, USA
Scranton, PA, USA
12
1 min
15
49 ft
Baltimore, MD, USA
Charlotte, NC, USA

Iterating through list of URLs in Python - bs4

Answers (1)

Related Questions