Reputation: 45
I want to read HTML files in python. Normaly I do it like this (and it works):
import codecs
f = codecs.open("test.html",'r')
print f.read()
The Problem is that my html files are not all in the same Folder since have a program which generates this html files and save them into folders which are inside the folder where I have my script to read the files. Summarizing, I have my script in a Folder and inside this Folder there are more Folders where the generated html files are.
Does anybody know how can I proceed?
Upvotes: 0
Views: 84
Reputation: 3906
use os.walk:
import os,codecs
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".html"):
f = codecs.open(os.path.join(root, file),'r')
print f.read()
Upvotes: 0
Reputation: 23500
import os
import codecs
for root, dirs, files in os.walk("./"):
for name in files:
abs_path = os.path.normpath(root + '/' + name)
file_name, file_ext = os.path.splitext(abs_path)
if file_ext == '.html':
f = codecs.open(abs_path,'r')
print f.read()
This will walk through <script dir>/
(./
will get translated to your script-directory) and loop through all files in each sub-directory.
It will check if the extension is .html
and do the work on each .html
file.
You would perhaps define more file endings that are "accepted" (for instance .htm
).
Upvotes: 1