Reputation: 327
I'd like to correct a script. But my head is nested. So I'd like to ask SO.
Script is:
from xml.dom.minidom import parse
from itertools import groupby
yXML = parse('/root/Desktop/gb/data/yConfig.xml')
servers = []
for AllConfigurations in yXML.getElementsByTagName('AllConfigurations'):
for DeployConfigurations in AllConfigurations.getElementsByTagName('DeployConfigurations'):
for Servers in DeployConfigurations.getElementsByTagName('Servers'):
for Group in Servers.getElementsByTagName('Group'):
for GApp in Group.getElementsByTagName('GApp'):
for Server in Group.getElementsByTagName('Server'):
servers.append((Server.getAttribute('name'),
Group.getAttribute('name'),
Server.getAttribute('ip'),
GApp.getAttribute('type')))
def line(machine, group, ip, services):
return " | ".join([machine.ljust(22), group.ljust(22), ip.ljust(18), services])
print line("Machine", "Group", "IP", "Services")
print line("----------", "----------", "----------", "----------")
for server, services in groupby(sorted(servers), lambda server: server[0:3]):
print line("- " + server[0], server[1], server[2],
", ".join(service[3] for service in set(services)))
XML is:
<AllConfigurations>
<DeployConfigurations>
<Servers>
<Group id="1" name="The Perfect Life" username="root" password="mypasswd123" state="">
<GApp id="1" name="JBoss Servers" type="JBoss" path="/root/Desktop/jboss-as-7.0.2.Final/" state="">
<Server id="1" name="Jboss1" ip="192.168.1.250" path="/root/Desktop/jboss-as-7.0.2.Final/" username="" password="" state="" />
<Server id="2" name="Jboss2" ip="192.168.1.251" path="/root/Desktop/jboss-as-7.0.2.Final/" username="" password="" state="" />
<Server id="3" name="Jboss3" ip="192.168.1.252" path="/root/Desktop/jboss-as-7.0.2.Final/" username="" password="" state="" />
<Server id="4" name="Jboss4" ip="192.168.1.253" path="/root/Desktop/jboss-as-7.0.2.Final/" username="" password="" state="" />
</GApp>
<GApp id="2" name="Tomcat Servers" type="Tomcat" path="/root/Desktop/apache-tomcat-7.0.22/" state="">
<Server id="1" name="Tom1" ip="192.168.1.250" path="/root/Desktop/apachee/" username="" password="" state="" />
<Server id="2" name="Tom2" ip="192.168.1.251" path="/root/Desktop/apache-tomcat-7.0.22/" username="" password="" state="" />
<Server id="3" name="Tom3" ip="192.168.1.252" path="/root/Desktop/apache-tomcat-7.0.22/" username="" password="" state="" />
<Server id="4" name="Tom4" ip="192.168.1.111" path="/root/Desktop/apache-tomcat-7.0.22/" username="" password="" state="" />
</GApp>
</Group>
</Servers>
</DeployConfigurations>
</AllConfigurations>
Current output is:
Machine | Group | IP | Services
---------- | ---------- | ---------- | ----------
- Jboss1 | The Perfect Life | 192.168.1.250 | Tomcat, JBoss
- Jboss2 | The Perfect Life | 192.168.1.251 | Tomcat, JBoss
- Jboss3 | The Perfect Life | 192.168.1.252 | JBoss, Tomcat
- Jboss4 | The Perfect Life | 192.168.1.253 | JBoss, Tomcat
- Tom1 | The Perfect Life | 192.168.1.250 | JBoss, Tomcat
- Tom2 | The Perfect Life | 192.168.1.251 | Tomcat, JBoss
- Tom3 | The Perfect Life | 192.168.1.252 | JBoss, Tomcat
- Tom4 | The Perfect Life | 192.168.1.111 | JBoss, Tomcat
The issues are:
1- As you see at Tom4 there is no JBoss Server on 192.168.1.111. This server is only for Tomcat. Jboss4 have only JBoss (253), and others (250, 251, 252) have both. Services part is not functional.
2- The IP prints more than one time. I can't handle it...
3- And the Machine column...
They all must be like this:
Machine | Group | IP | Services
---------- | ---------- | ---------- | ----------
- Jboss1 / Tom1 | The Perfect Life | 192.168.1.250 | JBoss, Tomcat
- Jboss2 / Tom2 | The Perfect Life | 192.168.1.251 | JBoss, Tomcat
- Jboss3 / Tom3 | The Perfect Life | 192.168.1.252 | JBoss, Tomcat
- Jboss4 | The Perfect Life | 192.168.1.253 | JBoss
- Tom4 | The Perfect Life | 192.168.1.111 | Tomcat
So, what should I do?
Thanks
Upvotes: 0
Views: 154
Reputation: 27060
Warning: this answer is huge.
You have a bunch of problems in your code.
itertools.groupby()
used incorrectlyThe most relevant is that you are sorting and grouping your servers using two different key functions. When you group a sequence, it should be ordered by the same key function that will group it. In your case, since you are going to group by IP (which is the third element of the server tuple), your function should be:
def get_ip(server):
return server[2]
Now you even can sort the servers before processing them, for clarity:
sorted_servers = sorted(servers, key=get_ip)
The groupby()
iterator will yield various pairs, consisting by the key and a iterator which yields all results of that key, as you probably already know. Since the key is the IP, I will declare the loop as follows. Note that the function is the same that sorted the servers:
for ip, servers in groupby(sorted_servers, get_ip):
Inside the loop, what will we do? For each IP, we will get the set of all machines, the set of all groups* and the set of all services associated to the IP. First, we will create the empty sets:
machine_set = set()
group_set = set()
service_set = set()
Then, we will iterate over all servers yielded by the iterator returned by groupby()
for the given IP. For each server, we will add each server info to the corresponding set:
for machine, group, _, service in servers:
machine_set.add(machine)
group_set.add(group)
service_set.add(service)
Made that, we will join the machines, groups and services, each set in one string. Then just pass these values to the line()
function and print the result:
machines = " / ".join(machine_set)
groups = ", ".join(group_set)
services = ", ".join(service_set)
print line("- " + machines, groups, ip, services)
For clarity, the resulting code follows. You can just replace all the code before the declaration of the line()
function by the code below:
def get_ip(server):
return server[2]
sorted_servers = sorted(servers, key=get_ip)
print line("Machine", "Group", "IP", "Services")
print line("----------", "----------", "----------", "----------")
for ip, servers in groupby(sorted_servers, get_ip):
machine_set = set()
group_set = set()
service_set = set()
for machine, group, _, service in servers:
machine_set.add(machine)
group_set.add(group)
service_set.add(service)
machines = " / ".join(machine_set)
groups = ", ".join(group_set)
services = ", ".join(service_set)
print line("- " + machines, groups, ip, services)
The printed result is the one below:
Machine | Group | IP | Services
---------- | ---------- | ---------- | ----------
- Tom4 | The Perfect Life | 192.168.1.111 | JBoss, Tomcat
- Jboss1 / Tom1 | The Perfect Life | 192.168.1.250 | JBoss, Tomcat
- Tom2 / Jboss2 | The Perfect Life | 192.168.1.251 | JBoss, Tomcat
- Jboss3 / Tom3 | The Perfect Life | 192.168.1.252 | JBoss, Tomcat
- Jboss4 | The Perfect Life | 192.168.1.253 | JBoss, Tomcat
It is not exactly what you asked for: the lines are sorted by IP, not by the machine column. Of course: this is how we sorted the servers before. To sort as you asked, I would propose this solution: just before the for
, create a list in a variable. Instead of printing the line, append a tuple with the values to this list:
lines = []
for ip, servers in groupby(sorted_servers, get_ip):
# ... Same stuff here
machines = " / ".join(machine_set)
groups = ", ".join(group_set)
services = ", ".join(service_set)
# No more "print line("- " + machines, groups, ip, services)"
lines.append((machines, groups, ip, services))
Then, sort the list by the first item of the tuples (the machine names):
lines = sorted(lines, key=lambda l: l[0])
Now iterate over all server tuples and print the lines:
for machine, group, ip, service in lines:
print line(machine, group, ip, service)
In the beginning of the program, you have not less than six nested for
loops. Man, this is madness (or SPARTAAA, but both are bad ideas). You can easily remove all this nesting this way: retrieve all the Server
tags directly from the yXML
object. From each tag, you can get the server name by calling server.getAttribute('name')
. The Group
tag is the grandparent of the Server
tag, so you can get the group name with server.parentNode.parentNode.getAttribute('name')
. The IP can be retrieved from the server tag easily: server.getAttribute('ip')
. And the service name is an attribute in the parent of the server tag, so you can get it this way: server.parentNode.getAttribute('type')
.
Summing up, you can get all the servers with the rather smaller loop below:
for server in yXML.getElementsByTagName('Server'):
name = server.getAttribute('name')
group = server.parentNode.parentNode.getAttribute('name')
ip = server.getAttribute('ip')
service = server.parentNode.getAttribute('type')
servers.append((name, group, ip, service))
Remember the Zen of Python:
Flat is better than nested.
Oh, sure, there is still a problem: the machines names is not well sorted. This is easy to repair, however: just sort the sets. Replace the lines below
machines = " / ".join(machine_set)
groups = ", ".join(group_set)
services = ", ".join(service_set)
by the lines below:
machines = " / ".join(sorted(machine_set))
groups = ", ".join(sorted(group_set))
services = ", ".join(sorted(service_set))
In this example, we are sorting all sets, not only the machine names. I bet it is a good choice too.
I know this answer is inappropriately long but I hope to have both solved your problem and clarified a lot of points.
* It is not needed to even create a set of groups, since your example only shows one group, always. But I will do it for uniformity.
Upvotes: 2