Reputation: 1993
In Python, I am trying to pull out information with regex between two strings. I think it would be better to give an example.
<stuff>
1
2
3
4
</stuff>
<stuff>
5
7
8
9
</stuff>
I am trying to pull one of these containers* at a time and place them in a seperate file. I have found out how to pull the content between eg,5 6 7 8, and pull all of the records at the same time(in bash), but have not gotten the full container including the regex placed in a variable or a file, something I can work with.
So I would want to collect between and including <stuff> and < /stuff>
Any advise would be greatly appreciated. I am trying to work in python2 for this.
Upvotes: 0
Views: 221
Reputation: 1399
If you are trying to present a simplified picture of grabbing data out of an HTML page, then I would strongly recommend against regex [lookup in SO, for why].
Use BeautifulSoup or lxml. Much better, much more powerful.
Upvotes: 1
Reputation: 10541
If you need to parse data in XML format, you can try using facilities from xml.etree.ElementTree module.
from xml.etree.ElementTree import XML
single_item_data = XML("<stuff>1 2 3</stuff>").text
If you have some nested elements, you can do something like this below:
from xml.etree.ElementTree import XML
test_input_xml = '''
<lotsOfStuff>
<stuff>
1
2
3
4
</stuff>
<stuff>
5
7
8
9
</stuff>
</lotsOfStuff>
'''
test_input = XML(test_input_xml)
stuffs = test_input.findall("stuff")
for stuff in stuffs:
element_text = stuff.text
print element_text
Upvotes: 1