Reputation: 47
Good morning, I am working on a code to extract data from an XML-TEI marked-up file of a poem and I would like to print the list of the 'pos' attributes for each one of the lines of the poem ('l'). ('w' is the name of the word tag contained within the 'l' tag)
<l n="1"> <w pos="PREP" msd="--" lemma="de">De</w> <w pos="REL" msd="--" lemma="qui">qua</w> <w pos="ADV" msd="--" lemma="saepe">saepe</w> <w pos="PRON" msd="--" lemma="tu">tibi</w> <w pos="PUN" msd="--" lemma=",">,</w> </l> <l n="2"> <w pos="ADV" msd="--" lemma="non">non</w> <w pos="V" msd="IND" lemma="licet_est">licet</w> </l> <l n="3"> <w pos="PREP" msd="--" lemma="de">de</w> <w pos="REL" msd="--" lemma="qui">qua</w> <w pos="ADV" msd="--" lemma="saepe">saepe</w> </l>
result_4=bs_content.find_all('l')
for x in result_4:
print(len(x.find_all('w')))
for x in x.find_all('w'):
a=x.get('pos')
print(a)
The result is currently the following:
5
PREP
REL
ADV
PRON
PUN
2
ADV
V
3
PREP
REL
ADV
But I would like to have
5
['PREP', 'REL', 'ADV', 'PRON', 'PUN']
2
['ADV', 'V']
3
['PREP', 'REL', 'ADV']
May anyone help me? Thanks
Upvotes: 0
Views: 108