Reputation: 11
I'm trying to convert a XML file to HTML using python. We have the .css file that contains the codes for the format of the output. We have been trying to run the following code:
def main():
infile = open("WTExcerpt.xml", "r", encoding="utf8")
headline=[]
text = infile.readline()
outfile = open("DemoWT.html", "w")
print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
while text!="":
#print(text)
text = infile.readline()
text = text.replace("<w>", "")
if "<title>" in text and "</title>" in text:
print("<h1>",text,"</h1>\n",file=outfile)
elif text=="<head>":
while text!="</head>":
headline.append(text)
print("<h3>headline<\h3>\n",file=outfile)
main()
but we don't know how to make Python read "text" and "headline" as our variables (changing with every time the loop is executed) instead of a pure string. Do you have any idea? Thank you very much.
Upvotes: 1
Views: 92
Reputation: 2687
couple issues I see:
1.instead of initially creating headline as an empty list, why not just set it to be assigned in the loop? 2.your 'while' loop will never complete. Instead of using a while loop, you should use a for loop like so:
def main():
infile = open("WTExcerpt.xml", "r", encoding="utf8")
outfile = open("DemoWT.html", "w")
print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
for line in infile:
text = line.replace("<w>", "")
if "<title>" in text and "</title>" in text:
print("<h1>",text,"</h1>\n",file=outfile)
elif text=="<head>":
in_headline = True
headline = ""
elif text == "</head>":
in_headline = False
print("<h3>", headline, "</h3>\n", file=outfile)
elif in_headline:
headline += text
main()
You should iterate over the file object instead of using a while loop - for 1 because the way you structured the while loop it would never end, and for 2 because it's exponentially more "Pythonic" :).
Upvotes: 0
Reputation: 9858
You seem already to have worked out how to output a variable along with some string literals:
print("<h1>",text,"</h1>\n",file=outfile)
or alternatively
print("<h1>{content}</h1>\n".format(content=text), file=outfile)
or just
print("<h1>" + text + "</h1>\n", file=outfile)
The problem is more with how your loop reads in the headline - you need something like a flag variable (in_headline
) to keep track of whether we are currently parsing text that is inside a <head>
tag or not.
def main():
with open("WTExcerpt.xml", "r", encoding="utf8") as infile, open("DemoWT.html", "w") as outfile:
print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
in_headline = False
headline = ""
for line in infile:
text = line.replace("<w>", "")
if "<title>" in text and "</title>" in text:
print("<h1>",text,"</h1>\n",file=outfile)
elif text=="<head>":
in_headline = True
headline = ""
elif text == "</head>":
in_headline = False
print("<h3>", headline, "</h3>\n", file=outfile)
elif in_headline:
headline += text
However, it is advisable to use an xml parser instead of, effectively, writing your own. This quickly becomes a complicated exercise - for example this code will break if <title>
s are ever split across multiple lines, or if anything else is ever on the same line as the <head>
tag.
Upvotes: 1