ofer simchovitch
ofer simchovitch

Reputation: 39

how to define a web page class python 2.7 no import

i need to build a Webpage class that holds tha page's path and has some built in functions like str repr and such.. this class should be usefull later for building a "search engine" that compare pages and return the best mach for the search. the "pages" comes in the form of html files that i saved in my computer.

this is what i have for now :

def remove_html_tags(s):
    tag = False
    quote = False
    out = ""

    for c in s:
            if c == '<' and not quote:
                tag = True
            elif c == '>' and not quote:
                tag = False
            elif (c == '"' or c == "'") and tag:
                quote = not quote
            elif not tag:
                out = out + c

    return out


class WebPage:
    def __init__(self, filename):

        self.filename = filename

    def process(self):

        f = open(self.filename,'r')
        LINE_lst_1 = f.readlines()
        n = len(LINE_lst_1)

        LINE_lst = LINE_lst_1[1:n-1]

        STRUCTURE = {}

        for i in range(len(LINE_lst)):
            LINE_lst[i] = LINE_lst[i].strip(' \n\t')
            LINE_lst[i] = remove_html_tags(LINE_lst[i])
        for k in range(n-1):
            for line in LINE_lst:
                if len(line) == 0:
                    LINE_lst.remove(line)
        STRUCTURE['body_lines'] = LINE_lst[1:]
        STRUCTURE['title'] = LINE_lst[0]        
        global STRUCTURE

    def __str__(self):
        return STRUCTURE['title']+'\n' +' '.join(STRUCTURE['body_lines'])
    def __repr__(self):
        return STRUCTURE['title']

well everything is basicly working, but i want to do everything without creating a global dictionary that dosnt hold the information for long. i want to change the method process in a way that i wont need the STRUCTURE dictionary.

any ideas?

Upvotes: 0

Views: 29

Answers (1)

Kevin
Kevin

Reputation: 76184

Use self.STRUCTURE instead.

def process(self):
    #...
    self.STRUCTURE = {}
    #...
    self.STRUCTURE['body_lines'] = LINE_lst[1:]
    self.STRUCTURE['title'] = LINE_lst[0]        

def __str__(self):
    return self.STRUCTURE['title']+'\n' +' '.join(self.STRUCTURE['body_lines'])
def __repr__(self):
    return self.STRUCTURE['title']

... Although you may want to consider choosing a new variable name.

Upvotes: 1

Related Questions