Reputation: 75
I've been working with python for about 2 weeks, and I've been able to import an access db and store the values from the sql select, but one of the fields has html tags embedded in it. I'm trying to use the html_parser, and I made the class def in a separate py file that I import in my main file. When I try to call the routine from my main file, I get errors. Here are the commands in my main py file
from html.parser import HTMLParser
import html_parser
parser = MyHTMLParser()
Here are the commands in the html_parser.py file
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
global data_str
data_str = data_str + "!#@@@@@#!" + data
global data_str
data_str = ""
Here are the errors that appear when I run python from a command prompt with my main py file
C:\Users\Owner\AppData\Local\Programs\Python\Python39>python py_script8.py Importing MyHTMLParser class Traceback (most recent call last): File "C:\Users\Owner\AppData\Local\Programs\Python\Python39\py_script8.py", line 36, in parser = MyHTMLParser() NameError: name 'MyHTMLParser' is not definedBlockquote
If anyone has any insight, it would be greatly appreciated. (This has been a blast working with python.)
**** solution ***** MM truly helped me! Thanks so much! Here is what I did -
Close to the top of the main py file, this has been added to run the html_parser.
import html_parser
if __name__ == '__main__':
In a function that runs from for loop iterating through the records stored from the sql statement that gets all of the rows from the imported access database
global r_str
parser.data_str = ""
parser.feed(r_str)
#print(parser.data_str)
r_str = parser.data_str
The html_parser.py contents are this:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
# Superclass initialization.
super().__init__()
# Variables are initialized here.
self.data_str = ""
def handle_starttag(self, tag, attrs):
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
self.data_str += "!#@@@@@#!" + data
Upvotes: 3
Views: 114
Reputation: 186
Corrected the answer.
Please try this.
The main code is written in task.py.
task.py
import html_parser
if __name__ == '__main__':
# The initialization of the class of the external file(module name+.py) is "module name.classname()".
parser = html_parser.MyHTMLParser()
# Insert the HTML here.
parser.feed('<html><head><title>Parser Test</title></head>'
'<body><BLOCKQUOTE>Quoted content</BLOCKQUOTE></body></html>')
# In this way, you can retrieve the contents stored in the parser class.
print(parser.data_str)
html_parser.py
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
# Superclass initialization.
super().__init__()
# Variables are initialized here.
self.data_str = ""
def handle_starttag(self, tag, attrs):
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
self.data_str += "!#@@@@@#!" + data
The expected output is as follows.
Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data : Parser Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: blockquote
Encountered some data : Quoted content
Encountered an end tag : blockquote
Encountered an end tag : body
Encountered an end tag : html
!#@@@@@#!Parser Test!#@@@@@#!Quoted content
Upvotes: 2