Matthew Downey
Matthew Downey

Reputation: 259

log into website (specifically netflix) with python

I am trying to log into netflix with python, would work perfectly but i cant get it to detect weather or not login failed, the code looks like this:

#this is not purely my code! Thanks to Ori for the code
import urllib
username = raw_input('Enter your email: ')
password = raw_input('Enter your password: ')
params = urllib.urlencode(
{'email': username,
'password': password })
f = urllib.urlopen("https://signup.netflix.com/Login", params)
if "The login information you entered does not match an account in our records.       Remember, your email address is not case-sensitive, but passwords are." in f.read():
    success = False
    print "Either your username or password was incorrect."
else:
    success = True
    print "You are now logged into netflix as", username
    raw_input('Press enter to exit the program')

As always, many thanks!!

Upvotes: 1

Views: 4926

Answers (1)

Mike Pennington
Mike Pennington

Reputation: 43077

First, I'll just share some verbiage I noticed on the Netflix site under Limitations on Use:

Any unauthorized use of the Netflix service or its contents will terminate the limited license granted by us and will result in the cancellation of your membership.

In short, I'm not sure what your script does after this, but some activities could jeopardize your relationship with Netflix. I did not read the whole ToS, but you should.

That said, there are plenty of legitimate reasons to scrape html information, and I do it all the time. So my first bet with this specific problem is you're using the wrong detection string... Just send a bogus email/password and print the response... Perhaps you made an assumption about what it looks like when you log in with a browser, but the browser is sending info that gets further into the process.

I wish I could offer specifics on what to do next, but I would rather not risk my relationship with 'flix to give a better answer to the question... so I'll just share a few observations I gleaned from scraping oodles of other websites that made it kindof hard to use web robots...

First, login to your account with Firefox, and be sure to have the Live HTTP Headers add-on enabled and in capture mode... what you will see when you login live is invaluable to your scripting efforts... for instance, this was from a session while I logged in...

    POST /Login HTTP/1.1
    Host: signup.netflix.com
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: en-us,en;q=0.5
    Accept-Encoding: gzip,deflate
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
    Keep-Alive: 115
    Connection: keep-alive
    Referer: https://signup.netflix.com/Login?country=1&rdirfdc=true
    --->Insert lots of private stuff here
    Content-Type: application/x-www-form-urlencoded
    Content-Length: 168
    authURL=sOmELoNgTeXtStRiNg&nextpage=&SubmitButton=true&country=1&email=EmAiLAdDrEsS%40sOmEMaIlProvider.com&password=UnEnCoDeDpAsSwOrD

Pay particular attention to the stuff below "Content-Length" field and all the parameters that come after it.

Now log back out, and pull up the login site page again... chances are, you will see some of those fields hidden as state information in <input type="hidden"> tags... some web apps keep state by feeding you fields and then they use javascript to resubmit that same information in your login POST. I usually use lxml to parse the pages I receive... if you try it, keep in mind that lxml prefers utf-8, so I include code that automagically converts when it sees other encodings...

            response = urlopen(req,data)
            # info is from the HTTP headers... like server version
            info = response.info().dict
            # page is the HTML response
            page = response.read()
            encoding = chardet.detect(page)['encoding']
            if encoding != 'utf-8':
                page = page.decode(encoding, 'replace').encode('utf-8')

BTW, Michael Foord has a very good reference on urllib2 and many of the assorted issues.

So, in summary:

  1. Using your existing script, dump the results from a known bogus login to be sure you're parsing for the right info... I'm pretty sure you made a bad assumption above
  2. It also looks like you aren't submitting enough parameters in the POST. Experience tells me you need to set authURL in addition to email and password... if possible, I try to mimic what the browser sends...
  3. Occasionally, it matters whether you have set your user-agent string and referring webpage. I always set these when I scrape so I don't waste cycles debugging.
  4. When all else fails, look at info stored in cookies they send
  5. Sometimes websites base64 encode form submission data. I don't know whether Netflix does
  6. Some websites are very protective of their intellectual property, and programatically reading/archiving the information is considered a theft of their IP. Again, read the ToS... I don't know how Netflix views what you want to do.
  7. I am providing this for informational purposes and under no circumstances endorse, or condone the violation of Netflix terms of service... nor can I confirm whether your proposed activity would... I'm just saying it might :-). Talk to a lawyer that specializes in e-discovery if you need an official ruling. Feet first. Don't eat yellow snow... etc...

Upvotes: 4

Related Questions