Reputation: 14532
The XML:
<?xml version="1.0"?>
<pages>
<page>
<url>http://example.com/Labs</url>
<title>Labs</title>
<subpages>
<page>
<url>http://example.com/Labs/Email</url>
<title>Email</title>
<subpages>
<page/>
<url>http://example.com/Labs/Email/How_to</url>
<title>How-To</title>
</subpages>
</page>
<page>
<url>http://example.com/Labs/Social</url>
<title>Social</title>
</page>
</subpages>
</page>
<page>
<url>http://example.com/Tests</url>
<title>Tests</title>
<subpages>
<page>
<url>http://example.com/Tests/Email</url>
<title>Email</title>
<subpages>
<page/>
<url>http://example.com/Tests/Email/How_to</url>
<title>How-To</title>
</subpages>
</page>
<page>
<url>http://example.com/Tests/Social</url>
<title>Social</title>
</page>
</subpages>
</page>
</pages>
The code:
// rexml is the XML string read from a URL
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
for node in tree.iter('page'):
for url in node.iterfind('url'):
print url.text
for title in node.iterfind('title'):
print title.text.encode("utf-8")
print '-' * 30
The output:
http://example.com/article1
Article1
------------------------------
http://example.com/article1/subarticle1
SubArticle1
------------------------------
http://example.com/article2
Article2
------------------------------
http://example.com/article3
Article3
------------------------------
The Xml represents a tree like structure of a sitemap.
I have been up and down the docs and Google all day and can't figure it out hot to get the node depth of entries.
I used counting of the children container but that only works for the first parent and then it breaks as I can't figure it out how to reset. But this is probably just a hackish idea.
The desired output:
0
http://example.com/article1
Article1
------------------------------
1
http://example.com/article1/subarticle1
SubArticle1
------------------------------
0
http://example.com/article2
Article2
------------------------------
0
http://example.com/article3
Article3
------------------------------
Upvotes: 17
Views: 20326
Reputation: 8904
class Base_Node(object):
def __init__(self, element:etree.Element, index:int):
self.element = element
self.index = index
self._d = {}
for attr in self.element.items():
self._d[attr[0].lower()]=attr[1]
def __str__(self) -> str:
return f'tag: {self.tag} path: {self._path} depth: {self.depth}'
@property
def tag(self) -> str:
return self.element.tag
@property
def _path(self) -> str:
return self.element.getroottree().getpath(self.element)
@property
def depth(self) -> int:
import re
r = re.sub('[^/]','',self._path)
return len(r)
@property
def sourceline(self) -> int:
return self.element.sourceline
Upvotes: 0
Reputation: 11
My approach, recursive function to list with level. You must first set the initial dept of the node you are passing:
# Definition of recursive function
def listchildrens(node,depth):
# Print node, indent with depth
print(" " * depth,"Type",node.tag,"Attributes",node.attrib,"Depth":depth}
# If node has childs, recall function for the node with existing depth
if len(node) > 0:
# Increase depth and recall function
depth+= 1
for child in node:
listchildrens(node,depth)
# Define starting depth
startdepth = 1
# Call the function with the XML body and starting depth
listchildrens(xmlBody,startdepth)
Upvotes: 1
Reputation: 61
import xml.etree.ElementTree as etree
tree = etree.ElementTree(etree.fromstring(rexml))
maxdepth = 0
def depth(elem, level):
"""function to get the maxdepth"""
global maxdepth
if (level == maxdepth):
maxdepth += 1
# recursive call to function to get the depth
for child in elem:
depth(child, level + 1)
depth(tree.getroot(), -1)
print(maxdepth)
Upvotes: 6
Reputation: 1859
This is another easy way of doing this without using an XML library:
depth = 0
for i in range(int(input())):
tab = input().count(' ')
if tab > depth:
depth = tab
print(depth)
Upvotes: -2
Reputation: 39205
The Python ElementTree
API provides iterators for depth-first traversal of a XML tree - unfortunately, those iterators don't provide any depth information to the caller.
But you can write a depth-first iterator that also returns the depth information for each element:
import xml.etree.ElementTree as ET
def depth_iter(element, tag=None):
stack = []
stack.append(iter([element]))
while stack:
e = next(stack[-1], None)
if e == None:
stack.pop()
else:
stack.append(iter(e))
if tag == None or e.tag == tag:
yield (e, len(stack) - 1)
Note that this is more efficient than determining the depth via following the parent links (when using lxml
) - i.e. it is O(n)
vs. O(n log n)
.
Upvotes: 12
Reputation: 857
lxml is best for this, but if you have to use the standard library, do not use iter and go walking the tree, so you can know where you are.
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
def sub(node, tag):
return node.findall(tag) or []
def print_page(node, depth):
print "%s" % depth
url = node.find("url")
if url is not None:
print url.text
title = node.find("title")
if title is not None:
print title.text
print '-' * 30
def find_pages(node, depth=0):
for page in sub(node, "page"):
print_page(page, depth)
subpage = page.find("subpages")
if subpage is not None:
find_pages(subpage, depth+1)
find_pages(tree)
Upvotes: 0
Reputation: 369424
Used lxml.html
.
import lxml.html
rexml = ...
def depth(node):
d = 0
while node is not None:
d += 1
node = node.getparent()
return d
tree = lxml.html.fromstring(rexml)
for node in tree.iter('page'):
print depth(node)
for url in node.iterfind('url'):
print url.text
for title in node.iterfind('title'):
print title.text.encode("utf-8")
print '-' * 30
Upvotes: 4