Reputation: 91
I need to iterate through .html files in a given directory and scrape data off them. So far this is my code, how would i access the script inside?
import os
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
print(os.path.join(directory,filename))
else:
continue
(System: Mac/Python3.x)
Upvotes: 0
Views: 2543
Reputation: 647
You could do something like this:
import os
from bs4 import BeautifulSoup
directory ='/Users/xxxxx/Documents/sample/'
for filename in os.listdir(directory):
if filename.endswith('.html'):
fname = os.path.join(directory,filename)
with open(fname, 'r') as f:
soup = BeautifulSoup(f.read(),'html.parser')
# parse the html as you wish
Upvotes: 3