Reputation: 1
I am new to Python and can use a little help. I am trying to write a script that will go out to a specific web site and download multiple .gif images in different spots at that site. Can anyone assist me in the right direction to take. This is the first one I have tried to make.
Here is what i got so far.
from http:// import http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ as bs
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys
def main(url, out_folder="C:\Users\jerry\Desktop\Heli/"):
"""Downloads all the images at 'url' to /test/"""
http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/ = bs(urlopen(url))
parsed = list(urlparse.urlparse(url))
for image in http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/.findAll("gif"):
print "gif: %(src)s" % image
filename = gif["src"].split("/")[-1]
parsed[2] = gif["src"]
outpath = os.path.join(out_folder, filename)
if gif["src"].lower().startswith("http"):
urlretrieve(gif["src"], outpath)
else:
urlretrieve(urlparse.urlunparse(parsed), outpath)
def _usage():
print "usage: python dumpimages.py http:folkworm.ceri.memphis.edu/heli/heli_bb_ag/ [outpath]"
if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
out_folder = sys.argv[-1]
url = sys.argv[-2]
if not url.lower().startswith("http"):
_usage()
sys.exit(-1)
main(url, out_folder)
Upvotes: 0
Views: 45
Reputation: 21643
Here is the basic idea.
>>> import requests
>>> from bs4 import BeautifulSoup
>>> item = requests.get('http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/')
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
... if '.gif' in link.attrs['href']:
... print (link.attrs['href'])
... break
...
CCAR_HHZ_AG_00.2017012700.gif?v=1485534942
The break statement is there just to interrupt the script so that it doesn't print all of the names of the gif's. The next step would be to add code to that loop to concatenate the URL mentioned in the requests.get to the name of each gif and do a requests.get for it. This time though you would do, say, image = item.content to get the image in bytes, which you could write to a file of your choice.
EDIT: Fleshed out. Note you still need to arrange to provide one file name for each output file.
>>> import requests
>>> from bs4 import BeautifulSoup
>>> URL = 'http://folkworm.ceri.memphis.edu/heli/heli_bb_ag/'
>>> item = requests.get(URL)
>>> page = item.text
>>> soup = BeautifulSoup(page, 'lxml')
>>> links = soup.findAll('a')
>>> for link in links:
... if '.gif' in link.attrs['href']:
... print (link.attrs['href'])
... pic = requests.get(URL + link.attrs['href'])
... image = pic.content
... open('pic.gif', 'wb').write(image)
... break
...
CCAR_HHZ_AG_00.2017012700.gif?v=1485535857
100846
Upvotes: 1