tsotso
tsotso

Reputation: 352

Identifying user-initiated web requests

Just by looking at a tcp-capture of a whole HTTP web-browsing session, would it be possible to differentiate between web requests that were initiated by the User (either by clicking on a link or by typing in the url in the address-bar) and web requests that were sent out (by the browser) as a result of web-page objects (images,iframes,ajax, etc').

The Referer Header does not answer the requirement, since its value would be the same in case of a User-initiated click on a link and a browser-request for a web-page object on that page.

Upvotes: 3

Views: 666

Answers (1)

Martin
Martin

Reputation: 38329

There is no simple solution to this, and I doubt it can even be done reliably, but here are some tips on how you can filter the data in multiple steps:

  1. Keep-Alive: HTTP allows multiple sequential requests on the same TCP connection (if keepalive is supported). It is probably safe to assume that only the first request on a TCP connection may be user initiated, while the rest ought to be for images/scripts related to that page. This should siginificantly reduce the number of requests you need to analyze further.

  2. Content-Type: If you are prepared to assume that only HTML was downloaded through user initiated requests, you could filter out anything where the response does not match a certain Content-Type (e.g. text/html)

  3. Response body: You are now left with only HTML responses, but looking at request/response headers it is pretty much impossible to differentiate an iframe request from a clicked link since Referer will be the same in both cases (eventhough most iframe downloads have probably been filtered out in step 1). To refine this further you would have to parse every HTML response and look for any <iframe src="..." or <link rel="prefetch" that could cause an HTML download that was not user initiated, and then filter out requests that were made for those resources.

None of this makes for a perfect analysis, but it might be good enough for your purposes. For instance, detecting requests from <meta> refresh would probably be impossible.

Upvotes: 1

Related Questions