Reputation: 37
I'm working on a python program to extract all the tags within a kml file.
import re
KML = open('NYC_Tri-State_Area.kml','r')
NYC_Coords = open('NYC_Coords.txt', 'w')
coords = re.findall(r'<coordinates>+(.)+<\/coordinates>', KML.read())
for coord in coords:
NYC_Coords.write(str(coord) + "\n")
KML.close()
NYC_Coords.close()
I tested the regex on the file within RegExr and it worked properly.
Here is a small sample of the kml file I'm reading: http://puu.sh/bhayn/2e233a1033.png
The output file contains lines with a single 0 on every line except the last one which is empty.
Upvotes: 1
Views: 76
Reputation: 70732
It seems you have the +
operators placed outside of your grouping.
So with >+
this matches >
literally between "one or more" times and using the dot .
in conjuction with a repeated capturing group (.)+
only the last iteration will be captured, in this case 0
for each match result.
Remove the beginning +
operator and move the one placed outside of the group to the inside.
coords = re.findall(r'<coordinates>(.+?)</coordinates>', KML.read())
Note: Use +?
to prevent greediness, you also probably want to use the s
(dotall) modifier here.
Upvotes: 3