PaperTsar
PaperTsar

Reputation: 989

Automated webtasks on javascript enabled webpages, using Java

I'm using Java.

Basically what I'm trying to achieve is to be able run automated tasks on the web. For example login to a site, do stuff there, fill out forms, logout etc.

I've already managed to do so (login and send comments) on simple sites where no javascript code is present (only html, html forms), by sending basic POST and GET messages using the HttpURLConnection class.

The problem is that I came across a site, that uses javascript code to generate session cookies (on the client side). So what happens is that when you access this site, javascript code gets executed in the browser, generating session cookies. Those cookies get sent with each request, without them, the site just won't function. (It is this site http://www.jofogas.hu/ )

I cannot get those cookies, because I do not know how to find out what Javascript code gets executed and I don't know how to run javascript code in Java and then use the results of that code.

Upvotes: 2

Views: 72

Answers (1)

Cyrbil
Cyrbil

Reputation: 6478

To answer the how:

First I saw that set-cookie header did not set any cookie named ses and this cookie magically appears. You know that in javascript, to set a cookie you call document.cookie. So I grabbed the source code, auto indent it, and search for document.cookie. You easily fall on setNewSessionCookie and you can see that it create a cookie named ses.

function setNewSessionCookie() {
    var b = get_vid();
    var d;
    var e;
    var a;
    var c = new Date();
    c.setTime(c.getTime());
    a = getCookie("ses");
    if (!a) {
        b = get_vid();
        if (!b) {
            return
        }
        d = (new Date()).getTime();
        a = b + "_" + d
    }
    e = new Date(c.getTime() + (1000 * 60 * 15));
    document.cookie = name + "ses=" + a + ";expires=" + e.toGMTString()
}

This cookie is set with the value of the variable a which is set by get_vid().

function get_vid() {
    var b;
    var a = getCookie("vsid");
    if (!a) {
        return false
    }
    if (a.search(",") == "-1") {
        b = Base64.decode(a)
    } else {
        b = a
    }
    return b.split(",")[0]
}

TL;DR: This function get the vsid cookie (which is send by set-cookie header), base64 decode it, and return the part before the coma. Next the ses cookie is created by appending "_" + (new Date()).getTime()

In my case:

getCookie("vsid") // "ODk5NjEwMzEsMTQzNDMwNDM4Ng=="
Base64.decode(getCookie("vsid")) // "89961031,1434304386"
getCookie("ses") // "89961031_1434304048406"

I warn you about legal issue on web scraping, it may be against the website terms of use (which are not in any language I speak ...).

Edit: For the question:

Automated webtasks on javascript enabled webpages, using Java

You may be interessed in the Selenium Project and their java driver

Upvotes: 2

Related Questions