user1295662
user1295662

Reputation: 21

Cookie to log in with Jsoup?

For a project I'm trying to get data from a website only acessible when you're logged in from the site Goodreads.com. I'm new to Jsoup, since I'm using it only for this particular project. Getting the relevant data from the website is not a problem, but I can't seem to get to the particular page I need. The page I'm trying to acces is viewable only when logged in, when not logged in it rederects to the log-in page.

I've looked through the answers here, but the answers given so far have not helped.

What I have now:

String url = "http://www.goodreads.com/friend/user/7493379-judith";
Connection.Response res = Jsoup.connect("http://www.goodreads.com/user/sign_in")
            .data("email", "MYEMAIL", "user_password", "MYPASSWORD")
            .method(Connection.Method.POST)
            .execute();

        Document doc2 = res.parse();
        String sessionId = res.cookie("_session_id"); 


    Document doc = Jsoup.connect(url)
            .cookie("_session_id", sessionId)
            .get();

I got this far with help of the answers here, but it doesn't work, I'm still only getting the data from the log-in page it rederects to. I have several questions:

  1. Most importantly of course; How can I make it work?
  2. The given answers here heve used method.(Method.POST) instead of method.(Connection.Method.POST) . When I use the first one however, I get an error that Method cannot be resolved. Anyone know why this is?
  3. The examples I've seen have used "username" and "password" in .data() . What exactly do these refer to? I've now used the name of the input box. Is it the name, the type, the id, what exactly? Since Goodreads does not refer to the log in as the username, but as the e-mail, I assume I have to change them. (username & password doesn't work either)
  4. Examples also use http://example.com/login.php as example url. Goodreads doesn't have a /login.php page though. Am I correct to assume I have to use the url with the log-in screen?
  5. _session_id is the name of the relevant cookie on Goodreads.

I'd be very grateful if anyone can point me in the right direction!

Upvotes: 2

Views: 3241

Answers (3)

MariuszS
MariuszS

Reputation: 31577

You can log in with this code:

public static void main(String[] args) throws Exception {

    Connection.Response execute = Jsoup
            .connect("https://www.goodreads.com/")
            .method(Connection.Method.GET).execute();

    Element sign_in = execute.parse().getElementById("sign_in");
    String authenticityToken = sign_in.select("input[name=authenticity_token]").first().val();
    String n = sign_in.select("input[name=n]").first().val();

    Document document = Jsoup.connect("https://www.goodreads.com/user/sign_in")
            .data("cookieexists", "✓")
            .data("authenticity_token", authenticityToken)
            .data("user[email]", "[email protected]")
            .data("user[password]", "password")
            .data("remember_me", "on")
            .data("n", n)
            .cookies(execute.cookies())
            .post();

}

Upvotes: 0

Vincent
Vincent

Reputation: 5425

  1. Goodreads requires two things when logging in: first, that you have a session ID stored in a cookie, and second, that you have a random generated number. You can get these when first visiting the login page without logging in: it will set a cookie with a session ID, and the form will contain a hidden input form (i.e. ) with the name "n" and value a number. Save these and pass them along as respectively a cookie and a form value when logging in.

Some remarks about the way I found this out:

The first thing you need to realise is that you're trying to recreate the exact same requests your browser does with Jsoup. So, in order to check whether what you have right now will work, you can try to recreate the exact same situation with your browser.

To recreate your code, I went to the login page, then I deleted all my Goodreads cookies (as you don't send along any cookies when you send the login request as well), and attempted to sign in with only passing the username and password form values. It gave an error that my session had timd out. When I first loaded the login page and then deleted all cookies except the session ID and did not remove the "n" form value, I could log in successfully. Therefore, you want to make a general GET request to the sign in page first, retrieve the session ID cookie you get there and the hidden form value, and pass it along with the POST request.

  1. It could be that the API changed or that there just are several ways. Using Connection.Method.POST will do fine, in any case.

  2. Yes, they refer to the names of the input boxes. This should be id, however, since name was used in the past and not all versions of all browsers supported passing the ids as data, most websites are just adding both. Either should be fine.

  3. If you look at the source code of the sign in form, you can see that the "method" attribute of the form element is indeed the sign in page itself, so that's where it sends the request to.

PS. As a general tip, you can use the Firefox extension "Tamper Data" to remove form data or even cookies (though there are easier extensions for that).

Upvotes: 1

vacuum
vacuum

Reputation: 2273

  1. See carefully what data is posted on login:

    user[email]:email@email

    remember_me:on

    user[password]:plain_pasword

    n:667387

So your post must execute exact same keys.

2.Make sure, you make right import: import org.jsoup.Connection.Method; but Connection.Method.POST is still good.

3.See p1

4.Yes, you are correct

5.what is the question?

Upvotes: 1

Related Questions