Reputation: 11
i need help with python programming:
i need a command which can search all the words between tags from a text file.
for example in the text file has <concept> food </concept>
. i need to search all the words between <concept>
and </concept>
and display them.
can anybody help please.......
Upvotes: 1
Views: 3033
Reputation: 7098
There is a great library for HTML/XML traversing named BeautifulSoup. With it:
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(open('myfile.xml', 'rt').read())
for t in soup.findAll('concept'):
print t.string
Upvotes: 3
Reputation: 35983
Have a look at regular expressions. http://docs.python.org/library/re.html
If you want to have for example the tag <i>
, try
text = "text to search. <i>this</i> is the word and also <i>that</i> end"
import re
re.findall("<i>(.*?)</i>",text)
Here's a short explanation how findall works: It looks in the given string for a given regular expression. The regular expression is <i>(.*?)</i>
:
<i>
denotes just the opening tag <i>
(.*?)
creates a group and matches as much as possible until it comes to the first</i>
, which concludes the tagNote that the above solution does not mach something like
<i> here's a line
break </i>
Since you just wanted to extract words.
However, it is of course possible to do so:
re.findall("<i>(.*?)</i>",text,re.DOTALL)
Upvotes: 1
Reputation: 328574
<concept>
using pos1 = s.find('<concept>')
</concept>
using pos2 = s.find('</concept>', pos1)
The words you seek are then s[pos1+len('<concept>'):pos2]
Upvotes: 3