Python Request for Post in Secure website.

Question

I am still learning python and this is my first go at accessing websites and scraping certain information for myself. I am trying to get my head around the language. So any input is welcome.

The data below is what I am seeing from the page source. I have to visit a certain page to enter my login info. After successful entry. I am redirected to another page for my password. I am trying to make a post via python requests. I have to go through two pages before I can scrap the third page information. However, I am only able to get past the first page of login.

Here is the Header and POST info that is being called for the USERNAME.

For the USERNAME PAGE:

(Request-Line)  
POST /client/factor2UserId.recip;jsessionid=15AD9CDEB48362372EFFC268C146BBFC HTTP/1.1
Host    www.card.com
User-Agent  Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-US,en;q=0.5
Accept-Encoding gzip, deflate
DNT 1
Referer https://www.card.com/client/
Cookie  JSESSIONID=15AD9CDEB48362372EFFC268C146BBFC
Connection  keep-alive
Content-Type    application/x-www-form-urlencoded
Content-Length  13

Post Data: 
login,  USERLOGIN

Here is the Header and Post info that is being called for the password:

For the PASSWORD PAGE:
(Request-Line)  
POST /client/siteLogonClient.recip HTTP/1.1
Host    www.card.com
User-Agent  Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:22.0) Gecko/20100101 Firefox/22.0
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-US,en;q=0.5
Accept-Encoding gzip, deflate
DNT 1
Referer https://www.card.com/client/factor2UserId.recip;jsessionid=15AD9CDEB48362372EFFC268C146BBFC
Cookie  JSESSIONID=15AD9CDEB48362372EFFC268C146BBFC
Connection  keep-alive
Content-Type    application/x-www-form-urlencoded
Content-Length  133

Post Data: 
org.apache.struts.taglib.html.TOKEN,     583ed0aefe4b04b
login,  USERLOGIN
password, PASSWORD

This is what I have come up with however, I can only access the first page. I am being redirected back to the first page once I call my function second_pass().

With my function first_pass(), I receive a response code 200. However, I receive the same code on the second_pass(), but if I print out the text of the page, its a redirect to page one. I never successfully get to page three.

import requests
import re

response = None
r = None

payload = {'login' : 'USERLOGIN'}
# acesses the username screen and adds username
# give login name
def first_pass():
    global response
    global payload
    url = 'https://www.card.com/client/factor2UserId.recip'
    s = requests.Session()
    response = s.post(url, payload)
    return response.status_code


# merges payload with x that contains user password
def second_pass():
    global payload
    global r
    # global response
    x = {'password' : 'PASSWORD'} # I add the Password in this function cause I am not sure if it will fail the first_pass()
    payload.update(x)
    url = 'https://www.card.com/client/siteLogonClient.recip'
    r = requests.post(url, payload)
    return payload
    return r.status_code



# searches response for Token!
# if token found merges key:value into payload
def token_search():
    global response
    global payload
    f = response.text

    # uses regex to find the Token from the HTML
    patFinder2 = re.compile(r"name=\"(org.apache.struts.taglib.html.TOKEN)\"\s+value=\"(.+)\"",re.I)
    findPat2 = re.search(patFinder2, f)

    # if the Token in found it turns it into a dictionary. and prints the dictionary 
    # if no Token is found it prints "nothing found" 
    if(findPat2):
        newdict = dict(zip(findPat2.group(1).split(), findPat2.group(2).split()))
        payload.update(newdict)
        print payload
    else:
        print "No Token Found"

I call my functions right now from the shell. I call them in this order. first_pass(), token_search(), second_pass().

When I call token_search(), it updates the dictionary with unicode. I am not sure if that is what is causes my errors.

Any advice on the code would be most welcome. I enjoy learning. But at this point I am beating my head against the wall.

brechin · Accepted Answer

If you're getting into scraping data, then I'd recommend learning about libraries like lxml or BeautifulSoup to more robustly gather data from pages (vs. using regex).

If the token finding code works, then my recommendation would be to re-arrange the code like this. It avoids global variables, keeping variables in the scope they belong in.

login('USERLOGIN', 'PASSWORD')

def login(username, password):
    loginPayload = {'login' : username}
    passPayload = {'password' : password}
    s = requests.Session()

    # POST the username
    url = 'https://www.card.com/client/factor2UserId.recip'
    postData = loginPayload.copy()
    response = s.post(url, postData)
    if response.status_code != requests.codes.ok:
        raise ValueError("Bad response in first pass %s" % response.status_code)
    postData.update(passPayload)
    tokenParam = token_search(response.text)
    if tokenParam is not None:
        postData.update(tokenParam)
    else:
        raise ValueError("No token value found!")
    # POST with password and the token
    url = 'https://www.card.com/client/siteLogonClient.recip'
    r = s.post(url, postData)
    return r


def token_search(resp_text):
    # uses regex to find the Token from the HTML
    patFinder2 = re.compile(r"name=\"(org.apache.struts.taglib.html.TOKEN)\"\s+value=\"(.+)\"",re.I)
    findPat2 = re.search(patFinder2, resp_text)

    # if the Token in found it turns it into a dictionary. and prints the dictionary 
    # if no Token is found it prints "nothing found" 
    if findPat2:
        newdict = dict(zip(findPat2.group(1).split(), findPat2.group(2).split()))
        return newdict
    else:
        print "No Token Found"
        return None

Python Request for Post in Secure website.

Answers (1)

Related Questions