Reputation: 41
I'm trying to copy sections in a file within a set of XML tags
> <tag>I want to copy the data here</tag>`
There are multiple sections of text I want to extract in the file so I'm trying to loop through the file to find each one. I just wanted to do this on a line-by-line basis till I figured out how to parse the lines of unwanted text and created the following code:
InputFile=open('xml_input_File.xml','r')
OutputFile=open('xml_output_file.xml', 'w')
check = 0
for line in InputFile.readlines():
if line.find("<STARTTAG>"):
check = 1
elif line.find(r"<//STARTTAG>"):
check = 0
if check == 1:
OutputFile.write(line)
The problem I'm having is it simply copies the whole file and not just the sections I would like.
I know the code is not very pretty but I'm still learning and its going to be a "d'oh!" moment but thanks for your help!!
Cheers
Upvotes: 0
Views: 140
Reputation: 1582
There's a few issues with your code:
"<STARTTAG> ... </STARTTAG>"
, capturing lines isn't going to cut it as you're going to grab at least the <STARTTAG>
instance.r"<//STARTTAG>"
) but you're using two forward slashes. From your example above, it looks like the closing tags only have one forward slash. I'm not sure why you need to use the literal string prefix at all here. If this is incorrect, that's probably why the check variable is never set to 0 (hence, the code copies the whole file).Edit: the point other posters have made about the return value of find() is very valid as well. Using the in
keyword is likely a better bet.
You need to look into splitting up your input (parsing), either manually (via split()) or by some regular expressions. Alternatively, you could try and groom your input into a compliant XML format and then use one of the many freely available libraries to handle this sort of thing.
Hope this helps!
Upvotes: 1
Reputation: 8290
find returns index of substring in the line. Probably that starttag is on the beginning of the line (index is zero) so if doesn't work as it should.
Try:
if line.find("<STARTTAG>") != -1:
or even better
if "<starttag>" in line:
or use some XML parser for python.
Upvotes: 0
Reputation: 76147
Help on method_descriptor:
find(...)
S.find(sub[, start[, end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within s[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
-1 is also a True
value.
try:
if "<STARTTAG>" in line:
etc.
Also, forward slash doesn't need to be escaped (even less in raw strings!).
Upvotes: 0