Reputation: 81
I'm trying to read .html files into pd.read_html(). However each .html file is within a different directory. So I've iterated over each directory and put the path/name
+ html_file_name
in a list called html_paths
. I want to iterate over this list and read each .html file in html_paths
with pd.read_html()
I've tried to iterate over the html_paths like this:
for I in range(len(html_paths)):
html_files = pd.read_html(html_paths[i])
I also tried to glob the original html_paths I set up with this:
for I in path.glob('**/*.html'):
html_files = pd.read_html(i)
Any way I try to iterate over my path lib list I get an error similar to TypeError: Cannot read object type 'WindowsPAth'
So far I've written:
# initialize path
p = Path('C:\path\to\mother\directory')
# iterate over all directories within mother directory
# glob all html files
html_paths = [file for file in p.glob('**/*.html')
And now I want to iterate over each file in html_paths
and read them into pd.read_html()
Upvotes: 0
Views: 384
Reputation: 827
Your html_paths
list contains Path objects, not strings like read_html
is expecting. Try converting it to a string:
for I in range(len(html_paths)):
html_files = pd.read_html(str(html_paths[I]))
Upvotes: 1