BatRuby
BatRuby

Reputation: 11

How to scrape local file with BeautifulSoup

I'm learning Python and I'm following this online class lesson.

https://openclassrooms.com/fr/courses/7168871-apprenez-les-bases-du-langage-python/exercises/4173

At the end of the lesson, we're learning the ETL procedure.

Question 3: I have to load an HTML script and use BeautifulSoup in a Python script.

The problem is there: the only thing I've done when it comes to data mining is with a website, I create a variable that contains the URL link of the website and after that I create a variable soup.

import requests
from bs4 import BeautifulSoup

url = 'https://www.gov.uk/search/news-and-communications'
reponse = requests.get(url)
page = reponse.content
soup = BeautifulSoup(page, 'html.parser')

This is easy because the HTML code is in a URL but how can I do that with a file inside my machine?

  1. I create a new HTML file with the script inside (the file is named TestOC.html)
  2. I create a new Python file.
from bs4 import BeautifulSoup

soup = BeautifulSoup('TestOC.html', 'html.parser')

But the file is not taken. How can I do that?

Upvotes: 1

Views: 491

Answers (1)

Mureinik
Mureinik

Reputation: 311823

BeautifulSoup takes the content, not the file name. You could open it yourself and read() it though:

with open('TestOC.html') as f:
    content = f.read()
    soup = BeautifulSoup(content, 'html.parser')

Upvotes: 2

Related Questions