Reputation: 23
I am trying to write a crawler with twill for a page, and it requires me to login. I can fill in the form. But the submit() function of twill does not seem to "click" the button.
Form name=fSSUser_Logon (#1)
## ## __Name__________________ __Type___ __ID________ __Value__________________
1 TFORM hidden TFORM SSUser.Logon
2 TPAGID hidden TPAGID SRLpKQyn1yc8
3 TEVENT hidden TEVENT
4 TXREFID hidden TXREFID 2
5 TOVERRIDE hidden TOVERRIDE
6 TDIRTY hidden TDIRTY 1
7 TWKFL hidden TWKFL
8 TWKFLI hidden TWKFLI
9 TFRAME hidden TFRAME
10 TWKFLL hidden TWKFLL
11 TWKFLJ hidden TWKFLJ
12 TREPORT hidden TREPORT
13 TRELOADCMP hidden TRELOADCMP
14 TRELOADID hidden TRELOADID SRLpKQy1nyc7
15 TOVERLAY hidden TOVERLAY
16 RELOGON hidden RELOGON
17 USERNAME text USERNAME
18 PASSWORD password PASSWORD
19 Logon button Logon Logon
showforms() on the page as above.
And the actual code for the button is the following:
<input type="button" class="clsButton" id="Logon" name="Logon" tabindex="3" value="Logon" title="Logon">
It does not have any formaction I can use.
My code thus far:
from twill.commands import *
from twill import get_browser
go("https://trakcarelabwebview.nhls.ac.za/trakcarelab/csp/logon.csp")
showforms()
fv("1", "USERNAME", "xx")
fv("1", "PASSWORD", "xx")
fv("1", "Logon", "Logon")
formaction('Logon','https://trakcarelabwebview.nhls.ac.za/trakcarelab/csp/logon.csp#TRAK_main')
submit()
show()
showforms()
Where the frame "TRAK_main" is the frame with the HTML I need. The last showforms() shows exactly the same forms as before the "login".
What am I doing wrong here?
Upvotes: 0
Views: 2214
Reputation: 48599
What am I doing wrong here?
Your answer lies here:
twill does not understand javascript.
When a browser
loads that page, the js on the page executes, and the js assigns an onclick event handler to the Logon button. Then when the Logon button is clicked the event handler function sets the value for one of the hidden form fields. The server checks for that value in the request, and if the value is absent the login fails, and the server redirects back to the login page.
Because twill
does not understand js, the value for the hidden form field never gets set, and therefore when twill sends a request to the server the value for the hidden form field is missing in the request.
Websites try all kinds of tricks to keep programs from accessing their pages.
Upvotes: 0