Reputation: 6474
I am working to create a web crawler in Java. The crawler visits websites, accesses/stores data in database using JDBC and also stores files locally or on cloud storage.
As part of the crawling, I want to note the exact details of usage for the crawler--
Parameters like--
Number of sites visited (HTTP+HTTPS)
Number of bytes of data received over one run of the crawler
Number of bytes of data sent over one run of the crawler
Number of rows updated/inserted/deleted/selected via JDBC over that run of the crawler
Number of bytes of data stored+accessed in local machine (on which the crawler is running)
Number of bytes of data stored+accessed in cloud storage (like Amazon S3)
Is there any quick way to accomplish some or all of the above? Maybe some library that has to be plugged in to my java app? Will I have to individually note down all of the above parameters at every stage when the crawler performs some action (like visit a website, download data etc)?? I dont want the program to get bogged down simply because I want to measure and track the above parameters.
I am looking to use the crawler as both a desktop app and web app, so solutions for both are welcome...
Upvotes: 0
Views: 154
Reputation: 51445
Will I have to individually note down all of the above parameters at every stage when the crawler performs some action (like visit a website, download data etc)?
Yes.
You're adding numbers to integer or long values in a global statistics class you'll have to create for your application. Your program should not get bogged down performing addition.
Upvotes: 1