Reputation: 914
I've got an xml-ish file I'm trying to parse with BeautifulSoup with let's say an unknown multiple of tags within the tree of another tag. Things go swimmingly, at least for the first tag I'm extracting within the set of nexted tags. This isn't really html or xml, but close...
Given the format:
<data>
<type>
<type_attribute_1>1</type_attribute_1>
<type_attribute_2>2</type_attribute_2>
</type>
<type>
<type_attribute_1>3</type_attribute_1>
<type_attribute_2>4</type_attribute_2>
</type>
</data>
How might I extract the values of type_attribute_1 and type_attribute_2 for both type tags and assign to a variable -- i.e. "Type_1_attribute_1", "Type_1_attribute_2", "Type_2_attribute_1" & "Type_2_attribute_2"
I'm using code like this, but it only works on the first <type>
located within the <data>
:
Type_1_Attribute_1 = soup.data.type.type_attribute_1.text
Type_1_Attribute_2 = soup.data.type.type_attribute_2.text
UPDATE
I think to phrase what I'm looking for a little differently may help. Instead of declaring the variable name Type_1_Attribute_1, as I don't know how many Type siblings there are, tack "_1", "_2", "_3"... on to "Type, for each sibling. i.e.
Assuming:
Types = [i.stripText() for i in soup.select('Type')]
parseables = len(Types)
for i in range(0, parseables)
j = i+1
Type = Types[i]
Attribute_1 = Type.Type_Attribute_1.text
print Attribute_1
Which prints the value of Attribute_1 for each Type, How would I add "Type_j" in Attribute_1 to be filled in with j's value?
Upvotes: 2
Views: 1706
Reputation: 5302
What about this-
from bs4 import BeautifulSoup as bs
data = """<data>
<type>
<type_attribute_1>1</type_attribute_1>
<type_attribute_2>2<2/type_attribute_2>
</type>
<type>
<type_attribute_1>3</type_attribute_1>
<type_attribute_2>4</type_attribute_2>
</type>
</data>"""
soup = bs(data,'lxml')
Type_1_Attribute_1 = [i.text.strip() for i in soup.select('type_attribute_1')]
Type_1_Attribute_2 = [i.text.strip() for i in soup.select('type_attribute_2')]
print filter(bool,Type_1_Attribute_1)
print filter(bool,Type_1_Attribute_2)
Output-
[u'1', u'3']
[u'2', u'4']
EDIT I do not get you, why you need this where looping over the list itself a variable (iterator)- e.g
for i in Type_1_Attribute_1:
print (i)# here i itself a variable and it changes when i reiterate
Prints-
1
3
So if you need to use every item from that list just use iterator and pass to a function as i passed to print
function.
Upvotes: 2