Reputation: 1423
I'm creating a tool for scraping links from multiple URLs. I want to store this information, then test the scraped links for their status.
I am expecting having to test a lot of links, about 60,000. So the problem I have is deciding how to store the links to test.
What I'm thinking of doing is creating text files for the URLs I'll be scraping. I'll have to create about 40 text files for URLs I'll be scraping(the URLs I'm scraping are the same URL, just regionalised).
Upvotes: 0
Views: 468
Reputation: 6051
imho the easiest approach is to use serialization to save your information. For example, serialize Map<String, Set<String>>
of urls. Multiple files should work too, without any serious performance impact. But it's slightly longer to implement
Another approach - register on mongolab and use free account. (It's not advertising, I just like this service) You don't need to install anything, just download mongo driver and go ahead
Upvotes: 1