kokazaki
kokazaki

Reputation: 11

HTML Coding With Python

I'm trying to convert a XML file to HTML using python. We have the .css file that contains the codes for the format of the output. We have been trying to run the following code:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    headline=[]
    text = infile.readline()
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    while text!="":
        #print(text)
        text = infile.readline()
        text = text.replace("<w>", "")

        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            while text!="</head>":
                headline.append(text)
                print("<h3>headline<\h3>\n",file=outfile)       


main()

but we don't know how to make Python read "text" and "headline" as our variables (changing with every time the loop is executed) instead of a pure string. Do you have any idea? Thank you very much.

Upvotes: 1

Views: 92

Answers (2)

n1c9
n1c9

Reputation: 2687

couple issues I see:

1.instead of initially creating headline as an empty list, why not just set it to be assigned in the loop? 2.your 'while' loop will never complete. Instead of using a while loop, you should use a for loop like so:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    for line in infile:
        text = line.replace("<w>", "")
        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            in_headline = True
            headline = ""
        elif text == "</head>":
            in_headline = False
            print("<h3>", headline, "</h3>\n", file=outfile)
        elif in_headline:
            headline += text
main()

You should iterate over the file object instead of using a while loop - for 1 because the way you structured the while loop it would never end, and for 2 because it's exponentially more "Pythonic" :).

Upvotes: 0

Stuart
Stuart

Reputation: 9858

You seem already to have worked out how to output a variable along with some string literals:

print("<h1>",text,"</h1>\n",file=outfile)

or alternatively

print("<h1>{content}</h1>\n".format(content=text), file=outfile)

or just

print("<h1>" + text + "</h1>\n", file=outfile)

The problem is more with how your loop reads in the headline - you need something like a flag variable (in_headline) to keep track of whether we are currently parsing text that is inside a <head> tag or not.

def main():
    with open("WTExcerpt.xml", "r", encoding="utf8") as infile, open("DemoWT.html", "w") as outfile:
        print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
        print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
        in_headline = False          
        headline = ""
        for line in infile:
            text = line.replace("<w>", "")
            if "<title>" in text and "</title>" in text:
                print("<h1>",text,"</h1>\n",file=outfile)
            elif text=="<head>":
                in_headline = True
                headline = ""
            elif text == "</head>":
                in_headline = False
                print("<h3>", headline, "</h3>\n", file=outfile)
            elif in_headline:
                headline += text

However, it is advisable to use an xml parser instead of, effectively, writing your own. This quickly becomes a complicated exercise - for example this code will break if <title>s are ever split across multiple lines, or if anything else is ever on the same line as the <head> tag.

Upvotes: 1

Related Questions