Wandering Sophist
Wandering Sophist

Reputation: 143

How do I parse every html file in a directory for images?

I have a directory full of html files, each of which has a clinical image of a psoriasis patient in it. I want to open each file, find the image, and save it in the same directory.

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in path:
    for f in files:
        soup = bs(f)
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

I get this error:

 Traceback (most recent call last):
   File "C:\Users\gokalraina\Desktop\modfile.py", line 7, in <module>
     for root, dirs, files in path:
 ValueError: need more than 1 value to unpack

Even after googling the error, I have no clue what is wrong or if I am doing this correctly. Please keep in mind that I am new to python.

EDIT: After making the suggested changes to the program, I am still getting an error:

  Traceback (most recent call last):
  File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module>
    im = Image.open(image)
  File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open
    prefix = fp.read(16)
 TypeError: 'NoneType' object is not callable

This is the revised code (thanks to nightcracker)

 import os, os.path
 import Image
 from BeautifulSoup import BeautifulSoup as bs

 path = 'C:\Users\gokalraina\Desktop\derm images'

 for root, dirs, files in os.walk(path):
    for f in files:
       soup = bs(open(os.path.join(root, f)).read())
       for image in soup.findAll("img"):
          print "Image: %(src)s" % image
          im = Image.open(image)
          im.save(path+image["src"], "JPEG")

Upvotes: 1

Views: 1612

Answers (3)

user177800
user177800

Reputation:

You need to provide a list of something meaningful using os.walk(path): providing a String is a single thing, it is expecting a list of things.

The idiomatic way of walking a file system is to use os.walk()

for root, dirs, files in os.walk(path):

Upvotes: 1

kindall
kindall

Reputation: 184455

for root, dirs, files in path:

path here is a string. Each element is only a single character, and you can't unpack a single character into three variables. Hence the error message: you need more than one value to unpack.

You probably want:

for root, dirs, files in os.walk(path):

Upvotes: 1

orlp
orlp

Reputation: 117926

You need to change this line:

for root, dirs, files in path:

to

for root, dirs, files in os.walk(path):

Also note that files are file names, not objects, so this would be your fixed code:

import os, os.path
import Image
from BeautifulSoup import BeautifulSoup as bs

path = 'C:\Users\gokalraina\Desktop\derm images'

for root, dirs, files in os.walk(path):
    for f in files:
        soup = bs(open(os.path.join(root, f)).read())
        for image in soup.findAll("img"):
            print "Image: %(src)s" % image
            im = Image.open(image)
            im.save(path+image["src"], "JPEG")

Upvotes: 1

Related Questions