user1601716
user1601716

Reputation: 1993

Looking to pull out data between two string

In Python, I am trying to pull out information with regex between two strings. I think it would be better to give an example.

<stuff>
1
2
3
4
</stuff>

<stuff>
5
7
8
9
</stuff>

I am trying to pull one of these containers* at a time and place them in a seperate file. I have found out how to pull the content between eg,5 6 7 8, and pull all of the records at the same time(in bash), but have not gotten the full container including the regex placed in a variable or a file, something I can work with.

So I would want to collect between and including <stuff> and < /stuff>

Any advise would be greatly appreciated. I am trying to work in python2 for this.

Upvotes: 0

Views: 221

Answers (2)

subiet
subiet

Reputation: 1399

If you are trying to present a simplified picture of grabbing data out of an HTML page, then I would strongly recommend against regex [lookup in SO, for why].

Use BeautifulSoup or lxml. Much better, much more powerful.

Upvotes: 1

Maksim Skurydzin
Maksim Skurydzin

Reputation: 10541

If you need to parse data in XML format, you can try using facilities from xml.etree.ElementTree module.

from xml.etree.ElementTree import XML
single_item_data = XML("<stuff>1 2 3</stuff>").text

If you have some nested elements, you can do something like this below:

from xml.etree.ElementTree import XML

test_input_xml = '''
<lotsOfStuff>
   <stuff>
   1
   2
   3
   4
   </stuff>

   <stuff>
   5
   7
   8
   9
   </stuff>
</lotsOfStuff>
'''

test_input = XML(test_input_xml)
stuffs = test_input.findall("stuff")

for stuff in stuffs:
   element_text = stuff.text
   print element_text

Upvotes: 1

Related Questions