Andrea
Andrea

Reputation: 47

Extract XML-TEI attributes from a file

Good morning, I am working on a code to extract data from an XML-TEI marked-up file of a poem and I would like to print the list of the 'pos' attributes for each one of the lines of the poem ('l'). ('w' is the name of the word tag contained within the 'l' tag)

<l n="1"> <w pos="PREP" msd="--" lemma="de">De</w> <w pos="REL" msd="--" lemma="qui">qua</w> <w pos="ADV" msd="--" lemma="saepe">saepe</w> <w pos="PRON" msd="--" lemma="tu">tibi</w> <w pos="PUN" msd="--" lemma=",">,</w> </l> <l n="2"> <w pos="ADV" msd="--" lemma="non">non</w> <w pos="V" msd="IND" lemma="licet_est">licet</w> </l> <l n="3"> <w pos="PREP" msd="--" lemma="de">de</w> <w pos="REL" msd="--" lemma="qui">qua</w> <w pos="ADV" msd="--" lemma="saepe">saepe</w> </l>
result_4=bs_content.find_all('l')
for x in result_4:
  print(len(x.find_all('w')))
  for x in x.find_all('w'):
    a=x.get('pos')
    print(a)

The result is currently the following:

5

PREP

REL

ADV

PRON

PUN

2

ADV

V

3

PREP

REL

ADV

But I would like to have

5

['PREP', 'REL', 'ADV', 'PRON', 'PUN']

2

['ADV', 'V']

3

['PREP', 'REL', 'ADV']

May anyone help me? Thanks

Upvotes: 0

Views: 108

Answers (0)

Related Questions