Rhys
Rhys

Reputation: 5282

python, list everything between two tags

I'm looking for the shortest neatest way to code the folloing.

say I have a string containing: 'the f<ox jumpe>d over the l<azy> dog <and the >fence'

Using < as the opening tag and > as the closing tag, I would like to save everything inbetween into a list.

if saved into list1, list1 would equal ['ox jumpe', 'azy', 'and the ']

Who knows of a nice, neat SHORT way to do this.

Thanks!

Upvotes: 3

Views: 2605

Answers (2)

Rusty Rob
Rusty Rob

Reputation: 17173

Assuming every "<" and every ">" indicate the start or end of a tag e.g. you cant have <hi<there>:

x="<a><bb><ccc>"
>>> starts=(i for i,c in enumerate(x) if c=="<")
>>> ends=(i for i,c in enumerate(x) if c==">")
>>> ans=[x[i+1:j] for i,j in zip(starts,ends)]
>>> ans
['a', 'bb', 'ccc']

use izip if it is a large xml file to save memory (Although x[i+1:j] would need to be changed as you wouldn't want the whole file as a string).

Upvotes: 1

Tudor Constantin
Tudor Constantin

Reputation: 26861

Regular expressions should do the trick here:

import re

text = 'the f<ox jumpe>d over the l<azy> dog <and the >fence'
list = re.findall('.*?\<(.*?)\>.*?', text)

print list

Edit:

You can read more about regex here

Mainly, what the regex from above does is:

.*? - non greedy match of all the characters until next wanted char

\< - matches the < char

(.*?) - non greedy match of all the characters until next wanted char, capture and returns them

Upvotes: 5

Related Questions