Kubas
Kubas

Reputation: 1028

Python big list and input to database

I must parse HTML files, which can be up to 500 000 links. Of which 400 000 will be desired by me.

Should I put all the links that satisfy the condition for the new list and then for the elements of this list and put it into the database.

Or when I find links to satisfy the condition to add it to the database (sqlite) (and commit it). Is that a large number of commits is not a problem?

I do not want to lose data in case of failure such as power. Thats why i want commit after insert to the database.

How best to place a large number of items in the database?

Upvotes: 1

Views: 131

Answers (3)

Denis
Denis

Reputation: 7343

You can try to use noSQL database like mongo. With mongo I add 500.000 documents with 6 fields in each added about 15 seconds (on my old laptop), and about 0.023 sec on not difficult queries.

Upvotes: 0

glglgl
glglgl

Reputation: 91149

If these many links are spread across several files, what about a commit after processing each file? Then you as well could remember which files you have processed.

In the case of a single file, record the file offset after each commit for clean continuation.

Upvotes: 1

John La Rooy
John La Rooy

Reputation: 304463

Consider just doing a commit after every 1000 records or so

Upvotes: 4

Related Questions