Reputation: 113
There is this XML data I have which needs to be parsed and certain information should be extracted. But, there is a catch when I am trying to extract the name field from the xml using beautifulSoup.
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
I am using BeautifulSoup as the standard approach and can't change to any other package. Hence, workaround using the same would be much appreciated.
below is the XML data: data highlighted in bold requires to be extracted.
<configurations>
<attributes-configuration>
<attributes>
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
<name>priority</name>
<description>priority of a requirement</description>
<customization-element>mydata.core.customization.requirements</customization-element>
<attribute-type>mydata.attribute_type.list</attribute-type>
<options>
<option>
<key>DEFAULT_LIST</key>
<value class="java.lang.String"> high,low,medium</value>
</option>
<option>
<key>LIST_TYPE</key>
<value class="java.lang.String">CUSTOM</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
<attribute-item id="mydata.core.customization.teststep.prerequisite">
<name>Prerequisite</name>
<description>User Defined Attribute</description>
<customization-element>mydata.core.customization.teststep</customization-element>
<attribute-type>mydata.attribute_type.string</attribute-type>
<options>
<option>
<key>DEFAULT_VALUE</key>
<value/>
</option>
<option>
<key>MAX_CHARACTERS</key>
<value class="java.lang.String">5000</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
</attributes>
</attributes-configuration>
<test-management/>
</configurations>
Below is my python Code:
import os
from bs4 import BeautifulSoup as bs
fileName = 'Configuration.xml'
fullFile = os.path.abspath(os.path.join('DataTransporter', fileName))
attributeList = []
with open(fullFile) as f:
soup = bs(f, 'xml')
for attribData in soup.find_all('attribute-item'):
dat = {
'attribName' : attribData.name,
'attribDesc' : attribData.description.text,
'attribValue' : attribData.options.value.text,
}
attributeList.append(dat)
#for attribParams in soup.find_all(name = 'value'):
#newdict[attribName.text] = attribParams.text
print(attributeList)
My Output:
[{'attribName': 'attribute-item', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'attribute-item', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
Expected output:
[{'attribName': 'priority', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
Upvotes: 0
Views: 50
Reputation: 1709
At first I thought that using attribData.name.text
should do it but it seems that 'name' is some kind of a keyword attribute for attribData
.
In order to get the correct values you could use the findChildren(<key>)
method as follows:
attribData.findChildren('name')[0].text
findChildren()
returns a list that in this case only has one value so it makes sense to use [0]
to get the element and then .text
to get the expected value.
To get the Id you could use attribData['id']
.
In summary, your code would look like this (inside the for loop):
dat = {
'attribName' : attribData.findChildren('name')[0].text,
'id': attribData['id'],
'attribDesc' : attribData.description.text,
'attribValue' : attribData.options.value.text,
}
The output would look like this:
[{'attribName': 'priority', 'id': 'mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'Prerequisite', 'id': 'mydata.core.customization.teststep.prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
I hope it helps!
Upvotes: 1