Reputation: 2847
I'm a Perl programmer with some nice scripts that go fetch HTTP pages (from a text file-list of URLs) with cURL and save them to a folder.
However, the number of pages to get is in the tens of millions. Sometimes the script fails on number 170,000 and I have to start the script again manually. It automatically reads the URL and sees if there is a page downloaded and skips. But, with a few hundred thousand, it still takes a few hours to skip back up to where it left off. Obviously, this is not going to pan out in the end.
I've been told that instead of saving to a text file, which is hard to search and modify, I need to use a database. I don't know much about databases, just messed around with MySQL on a school server a year ago. I just need the ability to add millions of rows and a few static columns, search/modify one quickly, and do this all locally on a lan (or a single computer if that's difficult). And of course, I need to access this database using perl.
Where should I start? What do I need to download to get a server started on Windows? Which Perl modules should I use? (I'm using an ActiveState distro)
Upvotes: 2
Views: 541
Reputation: 27183
Since you only need to search on one column, you may wish to consider a key/value store database like the Berkeley DB by using either BerkeleyDB
or DB_File
.
Generally, you can think of these key/value databases as being Perl hashes that operate from a disk rather than memory. Exact key look ups are very fast. Everything else requires scanning the whole dataset.
Upvotes: 5
Reputation: 12341
There's many sorts of databases, but if you've already decided for an SQL database and are trying to make the setup process easy, you might want to have a look at SQLite and the DBI
/DBD::SQLite
modules, which allow you to use that from perl.
Upvotes: 6
Reputation: 74232
Look into DBI. If you do not like SQL in your programs, try SQL::Abstract.
Upvotes: 1