Reputation: 39
i need to build a Webpage class that holds tha page's path and has some built in functions like str repr and such.. this class should be usefull later for building a "search engine" that compare pages and return the best mach for the search. the "pages" comes in the form of html files that i saved in my computer.
this is what i have for now :
def remove_html_tags(s):
tag = False
quote = False
out = ""
for c in s:
if c == '<' and not quote:
tag = True
elif c == '>' and not quote:
tag = False
elif (c == '"' or c == "'") and tag:
quote = not quote
elif not tag:
out = out + c
return out
class WebPage:
def __init__(self, filename):
self.filename = filename
def process(self):
f = open(self.filename,'r')
LINE_lst_1 = f.readlines()
n = len(LINE_lst_1)
LINE_lst = LINE_lst_1[1:n-1]
STRUCTURE = {}
for i in range(len(LINE_lst)):
LINE_lst[i] = LINE_lst[i].strip(' \n\t')
LINE_lst[i] = remove_html_tags(LINE_lst[i])
for k in range(n-1):
for line in LINE_lst:
if len(line) == 0:
LINE_lst.remove(line)
STRUCTURE['body_lines'] = LINE_lst[1:]
STRUCTURE['title'] = LINE_lst[0]
global STRUCTURE
def __str__(self):
return STRUCTURE['title']+'\n' +' '.join(STRUCTURE['body_lines'])
def __repr__(self):
return STRUCTURE['title']
well everything is basicly working, but i want to do everything without creating a global dictionary that dosnt hold the information for long.
i want to change the method process
in a way that i wont need the STRUCTURE
dictionary.
any ideas?
Upvotes: 0
Views: 29
Reputation: 76184
Use self.STRUCTURE
instead.
def process(self):
#...
self.STRUCTURE = {}
#...
self.STRUCTURE['body_lines'] = LINE_lst[1:]
self.STRUCTURE['title'] = LINE_lst[0]
def __str__(self):
return self.STRUCTURE['title']+'\n' +' '.join(self.STRUCTURE['body_lines'])
def __repr__(self):
return self.STRUCTURE['title']
... Although you may want to consider choosing a new variable name.
Upvotes: 1