Reputation: 75
I would like to parse through the xml and get tags with as little hard-coding as possible and convert to CSV
I will need to hard-code these specific column names: 'InfoGroup', 'InfoRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'
InfoGroup are the name tags like RecordingSystem, Ports, etc
InfoRegister is the sub name located inside the row tags like closedFileCount, processedFileCount, etc
RegisterType is the tag name where the sub name is located like , , , etc
Measures is just the measures tag
Description is just the description tag
GeneratedOn is located inside the generatedOn tag like sessmgr, rtpportal, etc
If there are any other or new tags in the xml I would like it to be able to add it to the csv automatically.
The current implementation I have is all basically hard-coded but I couldn't get it to function otherwise. Please run the code with my xml to see how the CSV should actually look like.
<?xml version="1.0" encoding="UTF-8"?>
<infoconfig xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" >
<group>
<name>RecordingSystem</name>
<row>
<package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package>
<class>RecSysFileOMRow</class>
<usage name="closedFileCount" hasThresholds="true">
<measures>
closed file count
</measures>
<description>
This register counts the number
of closed files in the spool directory of a
particular stream and a particular system.
Files in the spool directory store the raw
OAM records where they are sent to the
Element Manager for formatting.
</description>
<notes>
Minor and major alarms
when the value of closedFileCount
exceeds certain thresholds. Configure
the threshold values for minor and major
alarms for this OM through engineering
parameters for minorBackLogCount and
majorBackLogCount, respectively. These
engineering parameters are grouped under
the parameter group of Log, OM, and
Accounting for the logs’ corresponding
system.
</notes>
</usage>
<usage name="processedFileCount" hasThresholds="true">
<measures>
Processed file count
</measures>
<description>
The register counts the number
of processed files in the spool directory of
a particular stream and a particular system.
Files in the spool directory store the raw
OAM records and then send the records to
the Element Manager for formatting.
</description>
</usage>
</row>
<documentation>
<description>
Rows of this OM group provide a count of the number of files contained
within the directory (which is the OM row key value).
</description>
<rowKey>
The full name of the directory containing the files counted by this row.
</rowKey>
</documentation>
<generatedOn>
<all/>
</generatedOn>
</group>
<group traffic="true">
<name>Ports</name>
<row>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMRow</class>
<usage name="rtpMpPortUsage">
<measures>
BCP port usage
</measures>
<description>
Meter showing number of ports in use.
</description>
</usage>
<lwGauge name="connMapEntriesLWM">
<measures>
Lowest simultaneous port usage
</measures>
<description>
Lowest number of
simultaneous ports detected to be in
use during the collection interval
</description>
</lwGauge>
<hwGauge name="connMapEntriesHWM">
<measures>
Highest simultaneous port usage
</measures>
<description>
Highest number of
simultaneous ports detected to be in
use during the collection interval.
</description>
</hwGauge>
<waterMark name="connMapEntries">
<measures>
Connections map entries
</measures>
<description>
Meter showing the number of connections in the host
CPU connection map.
</description>
<bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/>
</waterMark>
<counter name="portUsageSampleCnt">
<measures>
Usage sample count
</measures>
<description>
The number of 100-second samples taken during the
collection interval contributing to the average report.
</description>
</counter>
<counter name="sampledRtpMpPortUsage">
<measures>
In-use ports usage
</measures>
<description>
Provides the sum of the in-use ports every 100 seconds.
</description>
</counter>
<precollector>
<package>com.nortelnetworks.ims.cap.mediaportal.host</package>
<class>PortsOMCenturyPrecollector</class>
<collector>centurySecond</collector>
</precollector>
</row>
<documentation>
<description>
</description>
<rowKey>
</rowKey>
</documentation>
<generatedOn>
<list>
<ne>sessmgr</ne>
<ne>rtpportal</ne>
</list>
</generatedOn>
</group>
<group traffic="true">
<name>SASIPPBXTrunkGroupCallMgmt</name>
<row>
<package>com.nortelnetworks.ims.cap.svc.sippbx.fsm</package>
<class>StandAloneSipPbxTrunkGroupOMRow</class>
<hwGauge name="callAttemptsHighForOrigination">
<measures></measures>
<description></description>
</hwGauge>
<waterMark name="callAttemptsForOrigination">
<measures> Number of Call attempts </measures>
<description>> This counter will keep track of incoming call attempts of Trunk Group to or from a SIPPBX node </description>
<bwg lwref="callAttemptsLowForOrigination" hwref="callAttemptsHighForOrigination"/>
</waterMark>
<lwGauge name="callAttemptsLowForOrigination">
<measures></measures>
<description></description>
</lwGauge>
<hwGauge name="callAttemptsHighForTermination">
<measures></measures>
<description></description>
</hwGauge>
<waterMark name="callAttemptsForTermination">
<measures> Number of Call attempts </measures>
<description>> This counter will keep track of outgoing call attempts of Trunk Group to or from a SIPPBX node </description>
<bwg lwref="callAttemptsLowForTermination" hwref="callAttemptsHighForTermination"/>
</waterMark>
<lwGauge name="callAttemptsLowForTermination">
<measures></measures>
<description></description>
</lwGauge>
<hwGauge name="activeCallsHighForOrigination">
<measures></measures>
<description></description>
</hwGauge>
<waterMark name="activeCallsForOrigination">
<measures> Number of Incoming Active calls </measures>
<description>> This counter will keep track of incoming active call of Trunk Group to or from a SIPPBX node </description>
<bwg lwref="activeCallsLowForOrigination" hwref="activeCallsHighForOrigination"/>
</waterMark>
<lwGauge name="activeCallsLowForOrigination">
<measures></measures>
<description></description>
</lwGauge>
<hwGauge name="activeCallsHighForTermination">
<measures></measures>
<description></description>
</hwGauge>
<waterMark name="activeCallsForTermination">
<measures> Number of Outgoing Active calls </measures>
<description>> This counter will keep track of outgoing call active call of Trunk Group to or from a SIPPBX node </description>
<bwg lwref="activeCallsLowForTermination" hwref="activeCallsHighForTermination"/>
</waterMark>
<lwGauge name="activeCallsLowForTermination">
<measures></measures>
<description></description>
</lwGauge>
<counter name="deniedCallsDueToCapacityForOrigination">
<measures>Number of Denied Calls due to capacity </measures>
<description>This counter will keep track denied for incoming call attempts of Trunk Group to or from a SIPPBX node </description>
</counter>
<counter name="deniedCallsDueToCapacityForTermination">
<measures>Number of Denied Calls due to capacity </measures>
<description>This counter will keep track denied for outgoing call attempts of Trunk Group to or from a SIPPBX node </description>
</counter>
<counter name="failoverRouteCallAttempts">
<measures>Number of FailOverRoute Call attempts </measures>
<description>This counter will keep track of FailOverRoute Call attempts of Trunk Group for a SIPPBX node </description>
</counter>
</row>
<documentation>
<description></description>
<rowKey></rowKey>
</documentation>
<generatedOn>
<list>
<ne>sessmgr</ne>
</list>
</generatedOn>
</group>
</infoconfig>
from bs4 import BeautifulSoup
import re
import csv
def extract_data_from_report3():
xmlfile = open('infoconfig.xml', 'r')
soup = BeautifulSoup(xmlfile, 'lxml')
with open('data2.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
writer.writerow(['InfoGroup:InfoRegister', 'InfoGroup', 'InfoRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'])
for item in soup.select('row [name]'):
desc = getattr(item.find('description'), 'text', None)
desc= str(desc)
desc = re.sub(r'\s{2,}', ' ', desc)
generatedOn = ','.join(ne.get_text(strip=True) for ne in item.find_parent('group').select('ne'))
writer.writerow([item.find_previous('name').text + ':' + item['name'], item.find_previous('name').text, item['name'], item.name, item.find('measures').get_text(strip=True), desc, generatedOn])
print("File successfuly converted to CSV")
Any help would be greatly appreciated
Upvotes: 0
Views: 172
Reputation: 331
I still don't understand the rules of the other new tags you mentioned, but I rewrite it according to your current logic. We can further communicate on this basis to finally achieve the results you want.
from simplified_scrapy import SimplifiedDoc, utils
def extract_data_from_report3():
header = [
'InfoGroup:InfoRegister', 'InfoGroup', 'InfoRegister', 'RegisterType', 'GeneratedOn' # edit
]
datas = []
doc = SimplifiedDoc(utils.getFileContent('infoconfig.xml'))
groups = doc.selects('group')
for group in groups:
name = group.select('name>text()')
# generatedOn = ','.join(group.selects('generatedOn>ne>text()'))
# edit start...
all = group.select('generatedOn').child
if not all.child:
generatedOn = all.tag
else:
generatedOn = ','.join(all.selects('ne>text()'))
# edit end...
RegisterTypes = group.row.children.containsReg(
'.+', attr='name') # The node with the name attribute.
for registerType in RegisterTypes:
extr = {}
for c in registerType.children:
if c['tag'] not in header:
header.append(c['tag'])
extr[c['tag']] = c.text # edit
datas.append([
'{}:{}'.format(name, registerType['name']), name,
registerType['name'], registerType['tag'], generatedOn, extr])
rows = [header]
for data in datas:
row = data[:-1]
extr = data[-1]
for i in range(5,len(header)): # edit
row.append(extr.get(header[i]))
rows.append(row)
utils.save2csv('data.csv', rows, newline='')
extract_data_from_report3()
Upvotes: 2