Reputation: 1028
I must parse HTML files, which can be up to 500 000 links. Of which 400 000 will be desired by me.
Should I put all the links that satisfy the condition for the new list and then for the elements of this list and put it into the database.
Or when I find links to satisfy the condition to add it to the database (sqlite) (and commit it). Is that a large number of commits is not a problem?
I do not want to lose data in case of failure such as power. Thats why i want commit after insert to the database.
How best to place a large number of items in the database?
Upvotes: 1
Views: 131
Reputation: 7343
You can try to use noSQL database like mongo. With mongo I add 500.000 documents with 6 fields in each added about 15 seconds (on my old laptop), and about 0.023 sec on not difficult queries.
Upvotes: 0
Reputation: 91149
If these many links are spread across several files, what about a commit after processing each file? Then you as well could remember which files you have processed.
In the case of a single file, record the file offset after each commit for clean continuation.
Upvotes: 1
Reputation: 304463
Consider just doing a commit after every 1000 records or so
Upvotes: 4