Waleed Eissa
Waleed Eissa

Reputation: 10513

Logging requests on high traffic websites

I wonder how high traffic websites handle traffic logging, for example a website like myspace.com receives a lot of hits, I can imagine it would take a lot of space to log all those requests, so, do they log every single request or how do they handle this?

Upvotes: 0

Views: 1188

Answers (7)

cherouvim
cherouvim

Reputation: 31903

If by logging you mean for collecting server related information (request and response times, db and cpu usage per request etc) I think they sample only the 10% or 1% of the traffic. That gives the same results (provide developers with auditing information) without filling in the disks or slowing the site down.

Upvotes: 0

Christian Nunciato
Christian Nunciato

Reputation: 10409

I'd be extremely surprised if they didn't log every single request, yes, and operations with particularly high traffic volumes usually roll their own log-management solutions against the raw server logs, in some form or other -- sometimes as simple batch-type processes, sometimes as complete subsystems.

One company I worked for, back in the dot-com heyday, got upwards of twenty million pageviews a day; for that site (actually a set of them, running across a few dozen machines in all, as I recall), our ops team wrote a quite sophisticated, clustered solution in C that parsed, translated (into relational storage), compressed and distributed the logs daily. Log files, especially verbose ones, pile up fast, and the commercial solutions available at the time just couldn't cut it.

Upvotes: 0

Razor
Razor

Reputation: 17498

We had a similar issue with out Intranet which is used by hundreds of people. The disk activity was huge and performance was being hurt.

The short answer is Asynchronous non-blocking logging.

Upvotes: 3

annakata
annakata

Reputation: 75794

ZXTM traffic shaping and logging, speaking from experience here

Upvotes: 0

Brent Ozar
Brent Ozar

Reputation: 13274

If you view source on a MySpace page, you get the answer:

<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-6293770-1");
pageTracker._setDomainName(".myspace.com");
pageTracker._setSampleRate("1"); //sets sampling rate to 1 percent
pageTracker._trackPageview(); 
</script>

That script means they're using Google Analytics.

They can't just gauge traffic using IIS logs because they may sell ads to third parties, and third parties won't take your word for how much traffic you get. They want independent numbers from a separate company, and that's where Google Analytics comes in.

Just for future reference - whenever you've got a question about how a web site is doing something, try viewing the source. You'd be amazed at what you can find there in plain view.

Upvotes: 3

user58670
user58670

Reputation: 1448

Don't how they track it since I don't work there. I am pretty sure that they have enough storage to record every little thing about their user if they wanted.

If I were them, I would use AwStats if I just wanted to know basic stuff about my users. It is more likely that they have developed their own scripts for tracking their users. Stuff they would log -ip_address
-referrer
-time
-browser
-OS

and so on. Then a script to see different data about the user varying by day, weeks, or months. As brulak said, something along the line of Analytics, but since they have access to actual database, they can learn much more about their users.

Upvotes: 0

cbrulak
cbrulak

Reputation: 15629

probably like google analytics.

Use Javascript to load a page on a difference server, etc.

Upvotes: 1

Related Questions