gdoron
gdoron

Reputation: 150253

How to collect page views while excluding bots and crawlers in 2016?

We want to add page views counters to our articles pages (just like in Stackoverflow), but we don't want to add page views of bots and crawlers.

I searched quite a bit, and only found very obsolete answers which say to fire an AJAX request, since crawlers and bots don't execute javascript... Well, it's 2016... I believe all the major crawlers execute javascript nowadays.

I thought about two viable solutions:

  1. Keep a list of all known bots and crawlers User Agents on the server, and only increase the counter in case the request isn't of one of them (seems like a very bad solution since the list needs to be maintained and updated regularly, and probably there will be many that the list won't catch).
  2. Use AJAX to send a request to an endpoint that is disallowed in robots.txt. (or a hidden image with a src="/article/track/?id=xxxxx")

The second option creates another request per page, not horrible, but maybe there's a better way? What is the common way of handling this today?

Using ASP.NET Core and storing the page views in redis if it matters

Upvotes: 9

Views: 446

Answers (2)

gdoron
gdoron

Reputation: 150253

I found out how Stackoverflow themselves handle it:

<script>
    StackExchange.ready(function(){$.get('/posts/40008735/ivc/e079');});
</script>
<noscript>
    <div>
        <img src="/posts/40008735/ivc/e079" class="dno" alt="" width="0" height="0">
    </div>
</noscript>

And in robots.txt:

Disallow: /*/ivc/*
...
User-agent: Googlebot-Image
Disallow: /*/ivc/*

So basically, they handle it as I suggested in option 2:

Issue an AJAX request (or with a hidden img in case javascript is disabled) and instruct crawlers and bots to not crawl that URL with Disallow.

Upvotes: 4

Oliver Salzburg
Oliver Salzburg

Reputation: 22099

As I mentioned on chat, you could cache the IP address of the client when it requests /robots.txt.

On other requests, check if the IP address is in the cache and don't count it as a page view if it is.

Upvotes: 1

Related Questions