Printing the content of all html files in a directory with BeautifulSoup

Question

I opened a directory, containing 200 html files using BeautifulSoup, but when I try to print the content of the all directory with print(soup.prettify()) it only shows the content of only one HTML file. The same happens if I try soup.find('title'), it only loads the title of the same HTML file as before. Can you tell me why ? Python does not show any error and I cannot understand what is wrong in my code.


import os
from bs4 import BeautifulSoup
import glob
import errno

dir_path = '/directory/path/to/folder/'
files = glob.glob(dir_path)
for name in files:
    try:
        with open(name) as f:
            soup = BeautifulSoup(f, "html.parser")
            print(type(soup))
    except IOError as exc:
        if exc.errno != errno.EISDIR:
            raise

print(type(soup))
soup.find('title')

PARVATHY · Accepted Answer

The glob module finds all the pathnames matching a specified pattern (see documentation). So, pass the dir_path argument as a pattern that matches all the HTML file names, by making use of the wildcard character *. Try doing:

dir_path = '/directory/path/to/folder/*.html'

Printing the content of all html files in a directory with BeautifulSoup

Answers (2)

Related Questions