reuben
reuben

Reputation: 91

iterate through and use HTML files in a directory - python

I need to iterate through .html files in a given directory and scrape data off them. So far this is my code, how would i access the script inside?

import os
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        print(os.path.join(directory,filename))
    else:
        continue

(System: Mac/Python3.x)

Upvotes: 0

Views: 2543

Answers (1)

Den1al
Den1al

Reputation: 647

You could do something like this:

import os
from bs4 import BeautifulSoup

directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
    if filename.endswith('.html'):
        fname = os.path.join(directory,filename)
        with open(fname, 'r') as f:
            soup = BeautifulSoup(f.read(),'html.parser')
            # parse the html as you wish

Upvotes: 3

Related Questions