Radio Controlled
Radio Controlled

Reputation: 950

sqlite3 - Add database to indexed database and index new rows

In sqlite3, say I have an indexed database, where by indexed I mean that each string information in the rows of all tables has been replaced with integers and there is a table for each field over all the tables that contains the mapping of integers and strings.

Now I would like to add to my indexed database another database which is not yet processed in such a way - that is there are still the actual strings in rows. The tables have exactly the same structure, so that for each table in the new database I add all the rows to the respective table in the old database - BUT with integers replacing the strings w.r.t. the string-integer mapping in the old database (and update the mapping for those strings in the new database that were not seen in the old one).

Is there a preferred way to do this?

The application would be a distributed indexing procedure where I run approx. 10 jobs to fill a FIFO queue with smaller databases and I run 1 job to pick databases from the queue, index them and add them to the final big database.

Upvotes: 0

Views: 97

Answers (2)

Radio Controlled
Radio Controlled

Reputation: 950

I accessed the database from python.

What I ended up doing is to maintain the indexing structures in memory as python objects until all the content-bearing integer rows were added to the new database. This allows me to do bulk inserts by the executemany() function. After all the rows are added to the database, the python index objects are bulk inserted into the database.

This works extremely fast as opposed to using the database on the fly as an indexing structure. However it has at least the following two disadvantages:

  1. The stored values cannot be foreign keys of the index table entries (but integers are not that large)

  2. The (python) index objects need to fit into memory (in another language it would probably require much less memory than it does in python)

I have to say that I did not try using an SSD, which could render the memory-based approach unnecessary.

Upvotes: 0

CL.
CL.

Reputation: 180010

The indexed tables would look like this:

CREATE TABLE data ( ColX INTEGER, ColY INTEGER );
-- foreign key constraints omitted

Let's assume that the index tables have a UNIQUE constraint on the string column:

CREATE TABLE ColX_strings ( ID INTEGER PRIMARY KEY, Text TEXT UNIQUE );
CREATE TABLE ColY_strings ( ID INTEGER PRIMARY KEY, Text TEXT UNIQUE );

Then we can simply use INSERT OR IGNORE to add only those new strings that are new, i.e., that do not conflict with any old one:

INSERT OR IGNORE INTO ColX_strings(Text) SELECT ColX FROM newDB.data;
INSERT OR IGNORE INTO ColY_strings(Text) SELECT ColY FROM newDB.data;

Now all strings are known, so we can simply look them up when copying:

INSERT INTO oldDB.data(ColX, ColY)
SELECT (SELECT ID FROM ColX_strings WHERE Text = T.ColX),
       (SELECT ID FROM ColY_strings WHERE Text = T.ColY)
FROM newDB.data AS T;

Upvotes: 1

Related Questions