Reputation: 2991
I am doing biomedical named extraction using Python.
Now I have to cross check the results from inputting the text to http://text0.mib.man.ac.uk/software/geniatagger/ and parse the source code of the HTML text that I get after submitting text into it.
I want that the same thing to be done in my GUI itself i.e. it input from GUI that I have made and submit the text into this website and get the source code so that for cross checking I don't have to visit each time from the browser.
Thanks in advance
Upvotes: 3
Views: 6475
Reputation: 19989
Actually, this is a great question!
First thing you have to do is to explore a source code of the website a little bit. If you look at the source code of the website you see this block of code
<form method="POST" action="a.cgi">
<p>
Please enter a text that you want to analyze.
</p>
<p>
<textarea name="paragraph" rows="15" cols="80" wrap="soft">
... some text here ...
### This is a sample. Replace this with your own text.
</textarea>
</p>
<p>
<input type="submit" value="Submit Text" />
<input type="reset" />
</p>
</form>
What you see is that request is send to a.cgi address, since we are already on address
http://text0.mib.man.ac.uk/software/geniatagger/
The data we want to send will be send to address concatenated with this one
http://text0.mib.man.ac.uk/software/geniatagger/a.cgi
But what are we going to send there? We need a data, data are send as "paragraph" POST parameter, you see that since form has attribute method with value POST, and name of textarea is "paragraph"
We open this using this python code
import urllib
import urllib2
text = """
Further, while specific constitutive binding to the peri-kappa B site is seen in monocytes, stimulation with phorbol esters induces additional, specific binding. Understanding the monocyte-specific function of the peri-kappa B factor may ultimately provide insight into the different role monocytes and T-cells play in HIV pathogenesis.
### This is a sample. Replace this with your own text.
"""
data = {
"paragraph" : text
}
encoded_data = urllib.urlencode(data)
content = urllib2.urlopen("http://text0.mib.man.ac.uk/software/geniatagger/a.cgi",
encoded_data)
print content.readlines()
And what do we get so far? We got an "engine" for your GUI program. What you can do is parse this content variable with python's HTMLParser (optional) And you mentioned that you want to display this in GUI? You can do this using GTK or Qt and map this functionality to a single button, you must read a tutorial , it's really easy for this purpose. If you have problems just comment this post and I can extend this answer with GUI
Upvotes: 5