Reputation: 1804
I am somewhat new to Python and programming in general so I apologize. By the way, thanks in advance.
I am parsing an xml document (kml specifically which is used in Google Earth) using Python 2.5, cElementTree and expat. I am trying to pull out all the text from the 'name', 'description' and 'coordinates' nodes inside each 'placemark' node for each geometry type (i.e. polylines, polygon, point), but I want to keep the geometry types separate. For example, I want only the 'name','description', and 'coordinates' text for every placemark that is part of a 'polygon' (i.e. it has a 'polygon' node). I will need to do this for 'polylines' and 'points' also. I have figured out a way to do this, but the code is long a verbose and specific to each geometry type, which leads me to my question.
Ideally, I would like to use the same code for each geometry type, but the problem is that each geometry type has a different node structure (i.e. different node names and number of nested nodes). So for proof of concept I thought this would be a good opportunity to use/learn recursion to drill down the node tree of 'placemark' node and get the information I was looking for. I have looked at the many posts on Python recursion and am still having problems with implementing the solutions provided.
The sample xml for a 'placemark' node is:
<Placemark>
<name>testPolygon</name>
<description>polygon text</description>
<styleUrl>#msn_ylw-pushpin</styleUrl>
<Polygon>
<tessellate>1</tessellate>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-81.4065,31.5072,0 -81.41269,31.45992,0 -81.34490,31.459696,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
The recursion function I am using is:
def getCoords( child, searchNode ):
# Get children of node
children = child.getchildren()
# If node has one or more child
if len( children ) >= 1 :
# Loop through all the children
for child in children:
# call to recursion function
getCoords( child, searchNode )
# If does not have children and is the 'searchNode'
elif len( children ) == 0 and child.tag == searchNode:
# Return the text inside the node. This is where it is not working
# Other posts recommended returning the function like
# return getCoords(child, searchNode), but I am getting an unending loop
return child.text
# Do nothing if node doesn't have children and does not match 'searchNode'
else:
print 'node does not have children and is not what we are looking for'
I am calling the recursion function like:
searchNode = 'coordinates'
# loop through all 'Placemark nodes' in document
for mark in placemark:
# Get children of 'Placemark' node
children = mark.getchildren()
# Loop through children nodes
for child in children:
# if a 'Polygon' node is found
if child.tag == 'Polygon':
# call recursion function
getCoords( child, searchNode)
I realize, at least, part of my problem is the return value. Other posts recommended returning the function, which I interpreted to be 'return getCoords(child, searchNode), but I am getting an unending loop. Also, I realize this could be posted on the GIS site, but I think this is more of a general programming question. Any ideas?
Upvotes: 1
Views: 4793
Reputation: 10129
With recursion you want to pay attention to your base cases, and your recursive cases. Whatever your base cases happen to be, if you expect to be able to collect information from your recursion, they have to return data that your recursive cases can (and more importantly do) use. Similarly you need to make sure the data your recursive cases return can be used by each other.
First identify your base and recursive cases.
The base cases are the "leaf" nodes, with no children. In a base case you want to just return some data, and not call the recursive function again. This is what allows you to get "back up the stack" as they say, and prevent infinite recursion.
The recursive cases will require you to save the data collected from a series of recursive calls, which is almost what you're doing in your for
loop.
I noticed that you have
# Recursive case: node has one or more child
if len( children ) >= 1 :
# Loop through all the children
for child in children:
# call to recursion function
getCoords( child, searchNode )
but what are you doing with the results of your getCoords calls?
You either want to save the results in some sort of a data structure which you can return at the end of your for loop, or if you're not interested in saving the results themselves, just print your base case 1 ( successful search ) when you reach it instead of returning it. Because now your base case 1 is just returning up the stack to an instance that isn't doing anything with the result! So try:
# If node has one or more child
if len( children ) >= 1 :
# Data structure for your results
coords = []
# Loop through all the children
for child in children:
# call to recursion function
result = getCoords( child, searchNode )
# Add your new results together
coords.extend(result)
# Give the next instance up the stack your results!
return coords
Now since your results are in a list and you're using the extend()
method you've got to make your base cases return lists as well!
# Base case 1: does not have children and is the 'searchNode'
elif len( children ) == 0 and child.tag == searchNode:
# Return the text from the node, inside a list
return [child.text]
# Base case 2: doesn't have children and does not match 'searchNode'
else:
# Return empty list so your extend() function knows what to do with the result
return []
This should just give you a single list in the end, which you'll probably want to store in a variable. I've just printed the results here:
searchNode = 'coordinates'
# loop through all 'Placemark nodes' in document
for mark in placemark:
# Get children of 'Placemark' node
children = mark.getchildren()
# I imagine that getchildren() might return None, so check it
# otherwise you'll get an error when trying to iterate on it
if children:
# Loop through children nodes
for child in children:
# if a 'Polygon' node is found
if child.tag == 'Polygon':
# call recursion function and print (or save) result
print getCoords( child, searchNode)
Upvotes: 6
Reputation: 658
You're not doing anything with the result of the recursion call when the node is the searchNode.
You need to aggregate the results of recursive calls to the children of a node or just use print child.text instead of return child.text.
Upvotes: 0