davidmytton
davidmytton

Reputation: 39264

How can I merge many SQLite databases?

If I have a large number of SQLite databases, all with the same schema, what is the best way to merge them together in order to perform a query on all databases?

I know it is possible to use ATTACH to do this but it has a limit of 32 and 64 databases depending on the memory system on the machine.

Upvotes: 101

Views: 121636

Answers (8)

Bash helper that automatically merges all tables from each DB

Here's a concise Bash version of https://stackoverflow.com/a/68526717/895245 that loops over all tables of the given DBs that have the same schema:

sqlite-merge-dbs

#!/usr/bin/env bash
set -eu
outdb="$1"
shift
indb0="$1"
shift
cp "$indb0" "$outdb"
for table in $(sqlite3 "$outdb" "SELECT name FROM sqlite_master WHERE type='table'"); do
  echo "table: $table"
  for db in "$@"; do
    echo "db: $db"
    sqlite3 "$outdb" "attach '$db' as 'db2'" "insert into \"$table\" select * from \"db2\".\"$table\""
  done
done

This works by extracting the table names from the special sqlite_master table.

Sample usage:

sqlite-merge-dbs out.sqlite in0.sqlite in1.sqlite in2.sqlite

Test:

rm -f in0.sqlite in1.sqlite in2.sqlite

sqlite3 in0.sqlite 'create table t(i integer, j integer)'
sqlite3 in1.sqlite 'create table t(i integer, j integer)'
sqlite3 in2.sqlite 'create table t(i integer, j integer)'
sqlite3 in0.sqlite 'insert into t values (1, -1), (2, -2)'
sqlite3 in1.sqlite 'insert into t values (3, -3), (4, -4)'
sqlite3 in2.sqlite 'insert into t values (5, -5), (6, -6)'

sqlite3 in0.sqlite 'create table s(k integer, l integer)'
sqlite3 in1.sqlite 'create table s(k integer, l integer)'
sqlite3 in2.sqlite 'create table s(k integer, l integer)'
sqlite3 in0.sqlite 'insert into s values (11, -11), (12, -12)'
sqlite3 in1.sqlite 'insert into s values (13, -13), (14, -14)'
sqlite3 in2.sqlite 'insert into s values (15, -15), (16, -16)'

./sqlite-merge-dbs out.sqlite in0.sqlite in1.sqlite in2.sqlite

sqlite3 out.sqlite 'select * from t'
echo
sqlite3 out.sqlite 'select * from s'

Output:

1|-1
2|-2
3|-3
4|-4
5|-5
6|-6

11|-11
12|-12
13|-13
14|-14
15|-15
16|-16

A Python version of the same:

#!/usr/bin/env python

import os
import argparse
import sqlite3
import shutil

parser = argparse.ArgumentParser()
parser.add_argument('out')
parser.add_argument('ins', nargs='+')
args = parser.parse_args()

shutil.copyfile(args.ins[0], args.out)
con = sqlite3.connect(args.out)
cur = con.cursor()
tables = list(map(lambda e: e[0], cur.execute("SELECT name FROM sqlite_master WHERE type='table'")))
for db2 in args.ins[1:]:
    cur.execute(f"attach '{db2}' as 'db2'")
    for table in tables:
        cur.execute(f"insert into {table} select * from db2.{table}")
    con.commit()
    cur.execute("detach database db2")

Bash helper that also skips autoincremented columns

The sqlite-merge-dbs script may undesirably blow up if there are autoincremented PK columns in the tables which overlap.

The following helper overcomes that by looking into the pragma table_info helper to decide if columns are autoincremented, and skips those from the merge, letting them be re-autoincremented at the end.

sqlite-merge-dbs-id

#!/usr/bin/env bash
set -eu
outdb="$1"
shift
indb0="$1"
shift
cp "$indb0" "$outdb"
for table in $(sqlite3 "$outdb" "SELECT name FROM sqlite_master WHERE type='table'"); do
  echo "table: $table"
  cols="$(sqlite3 "$outdb" "pragma table_info($table)" | awk -F\| '{ if ( $3!="INTEGER" || $6=="0" ) { print $2 } else { print "NULL" } }' | paste -sd , -)"
  echo $cols
  for db in "$@"; do
    echo "db: $db"
    sqlite3 "$outdb" "attach '$db' as 'db2'" "insert into \"$table\" select $cols from \"db2\".\"$table\""
  done
done

Test:

rm -f in0.sqlite in1.sqlite in2.sqlite

sqlite3 in0.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in1.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in2.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in0.sqlite 'insert into t values (NULL, 1, -1), (NULL, 2, -2)'
sqlite3 in1.sqlite 'insert into t values (NULL, 3, -3), (NULL, 4, -4)'
sqlite3 in2.sqlite 'insert into t values (NULL, 5, -5), (NULL, 6, -6)'

sqlite3 in0.sqlite 'create table s(id integer primary key, k integer, l integer)'
sqlite3 in1.sqlite 'create table s(id integer primary key, k integer, l integer)'
sqlite3 in2.sqlite 'create table s(id integer primary key, k integer, l integer)'
sqlite3 in0.sqlite 'insert into s values (NULL, 11, -11), (NULL, 12, -12)'
sqlite3 in1.sqlite 'insert into s values (NULL, 13, -13), (NULL, 14, -14)'
sqlite3 in2.sqlite 'insert into s values (NULL, 15, -15), (NULL, 16, -16)'

./sqlite-merge-dbs-id out.sqlite in0.sqlite in1.sqlite in2.sqlite

sqlite3 out.sqlite 'select * from t'
echo
sqlite3 out.sqlite 'select * from s'

Output:

1|1|-1
2|2|-2
3|3|-3
4|4|-4
5|5|-5
6|6|-6

1|11|-11
2|12|-12
3|13|-13
4|14|-14
5|15|-15
6|16|-16

pragma table_info is documented at: https://www.sqlite.org/pragma.html#pragma_table_info

Columns in the result set include: "name" (its name); "type" (data type if given, else ''); "notnull" (whether or not the column can be NULL); "dflt_value" (the default value for the column); and "pk" (either zero for columns that are not part of the primary key, or the 1-based index of the column within the primary key).

We can observe it in this example with:

sqlite3 in0.sqlite 'pragma table_info(t)'

which outputs:

0|id|INTEGER|0||1
1|i|INTEGER|0||0
2|j|INTEGER|0||0

and in the script we just skip over anything that is both INTEGER and a pk.

A Python version of the same:

#!/usr/bin/env python

import os
import argparse
import sqlite3
import shutil

parser = argparse.ArgumentParser()
parser.add_argument('out')
parser.add_argument('ins', nargs='+')
args = parser.parse_args()

shutil.copyfile(args.ins[0], args.out)
con = sqlite3.connect(args.out)
cur = con.cursor()
tables = list(map(lambda e: e[0], cur.execute("SELECT name FROM sqlite_master WHERE type='table'")))
table_to_pk_col = {}
table_to_insert = {}
table_to_cols = {}
for table in tables:
    cols = cur.execute(f'pragma table_info({table})').fetchall()
    table_to_cols[table] = cols
    for row in cols:
        col_name = row[1]
        type_ = row[2]
        pk = row[5]
        if type_ == 'INTEGER' and pk != 0:
            if table in table_to_pk_col:
                del table_to_pk_col[table]
            else:
                table_to_pk_col[table] = col_name
table_to_insert = { table: ','.join(list(map(
    lambda c: 'NULL' if c[1] == table_to_pk_col.get(table, None) else c[1], table_to_cols[table]
))) for table in tables }
for db2 in args.ins[1:]:
    cur.execute(f"attach '{db2}' as 'db2'")
    cur.execute(f"begin")
    for table in tables:
        cur.execute(f"insert into {table} select {table_to_insert[table]} from db2.{table}")
    con.commit()
    cur.execute("detach database db2")

Preserve foreign keys to autoincrement pks

OK, this is the boss level!!! Bash was getting too cumbersome so I pulled out some Python:

sqlite-merge-dbs-id-ref.py

#!/usr/bin/env python

import os
import argparse
import sqlite3
import shutil

parser = argparse.ArgumentParser()
parser.add_argument('out')
parser.add_argument('ins', nargs='+')
args = parser.parse_args()

shutil.copyfile(args.ins[0], args.out)
con = sqlite3.connect(args.out)
cur = con.cursor()
tables = list(map(lambda e: e[0], cur.execute("SELECT name FROM sqlite_master WHERE type='table'")))
table_to_pk_col = {}
table_to_insert = {}
table_to_cols = {}
table_to_pk_count = {}
table_to_col_to_foreign = {}
for table in tables:
    col_to_foreign = {}
    table_to_col_to_foreign[table] = col_to_foreign
    cols = cur.execute(f'pragma foreign_key_list({table})').fetchall()
    for col in cols:
        col_name = col[3]
        target_table = col[2]
        col_to_foreign[col_name] = target_table
for table in tables:
    cols = cur.execute(f'pragma table_info({table})').fetchall()
    table_to_cols[table] = cols
    for row in cols:
        col_name = row[1]
        type_ = row[2]
        pk = row[5]
        if type_ == 'INTEGER' and pk != 0:
            if table in table_to_pk_col:
                del table_to_pk_col[table]
            else:
                table_to_pk_col[table] = col_name
    if table in table_to_pk_col:
        table_to_pk_count[table] = cur.execute(f'select max({table_to_pk_col[table]}) from {table}').fetchone()[0]
    else:
        table_to_pk_count[table] = cur.execute(f'select count(*) from {table}').fetchone()[0]
def inc_str(table, col):
    if table in table_to_col_to_foreign:
        col_to_foreign = table_to_col_to_foreign[table]
        if col in col_to_foreign:
            return f'+{table_to_pk_count[col_to_foreign[col]]}'
    return ''
for db2 in args.ins[1:]:
    cur.execute(f"attach '{db2}' as 'db2'")
    table_to_pk_count_inc = {}
    for table in tables:
        table_to_insert = {
            table: ','.join(list(map(
                lambda c: 'NULL' if c[1] == table_to_pk_col.get(table, None) else \
                    c[1] + inc_str(table, c[1]),
                table_to_cols[table]
            ))) for table in tables
        }
        cur.execute(f"insert into {table} select {table_to_insert[table]} from db2.{table}")
        table_to_pk_count_inc[table] = cur.rowcount
    for table in tables:
        table_to_pk_count[table] += table_to_pk_count_inc[table]
    con.commit()
    cur.execute("detach database db2")

The script supposes that any INTEGER PRIMARY KEY that is also the only pk of the table is autoincremented.

This is the test we need to pass, which has tables t and s linked up by table ref with foreign keys to both:

rm -f in0.sqlite in1.sqlite in2.sqlite

sqlite3 in0.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in1.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in2.sqlite 'create table t(id integer primary key, i integer, j integer)'
sqlite3 in0.sqlite 'insert into t values (1, 1, -1), (2, 2, -2), (3, 0, 0)'
sqlite3 in1.sqlite 'insert into t values (1, 3, -3), (2, 4, -4), (3, 0, 0)'
sqlite3 in2.sqlite 'insert into t values (1, 5, -5), (2, 6, -6), (3, 0, 0)'

sqlite3 in0.sqlite 'create table s(id integer primary key, i integer, j integer)'
sqlite3 in1.sqlite 'create table s(id integer primary key, i integer, j integer)'
sqlite3 in2.sqlite 'create table s(id integer primary key, i integer, j integer)'
sqlite3 in0.sqlite 'insert into s values (1, 1, -1), (2, 2, -2)'
sqlite3 in1.sqlite 'insert into s values (1, 3, -3), (2, 4, -4)'
sqlite3 in2.sqlite 'insert into s values (1, 5, -5), (2, 6, -6)'

for i in 0 1 2; do
  sqlite3 "in$i.sqlite" <<EOF
create table ref(
  k integer,
  l integer,
  primary key(k, l),
  foreign key(k) references t(id),
  foreign key(l) references s(id)
)
EOF
done
sqlite3 in0.sqlite 'insert into ref values (1, 2)'
sqlite3 in1.sqlite 'insert into ref values (1, 2)'
sqlite3 in2.sqlite 'insert into ref values (1, 2)'

./sqlite-merge-dbs-id-ref.py out.sqlite in0.sqlite in1.sqlite in2.sqlite

echo t
sqlite3 out.sqlite 'select * from t'
echo s
sqlite3 out.sqlite 'select * from s'
echo ref
sqlite3 out.sqlite 'select * from ref'

Output:

t
1|1|-1
2|2|-2
3|0|0
4|3|-3
5|4|-4
6|0|0
7|5|-5
8|6|-6
9|0|0
+ echo

s
1|1|-1
2|2|-2
3|3|-3
4|4|-4
5|5|-5
6|6|-6
+ echo

ref
1|2
4|4
7|6

So now we see that tables t and s got their PKs increment as before, but for ref something more complex happened: the script increments the foreign keys accordingly with PK increments, so we've maintained the relations correctly.

E.g. line:

4|4

from ref links up 4|3|-3 from t and 4|4|-4 from s. And these values were previously linked up before the merge with:

sqlite3 in1.sqlite 'insert into ref values (1, 2)'

which linked up values from:

sqlite3 in1.sqlite 'insert into t values (1, 3, -3), (2, 4, -4), (3, 0, 0)'
sqlite3 in1.sqlite 'insert into s values (1, 3, -3), (2, 4, -4)'

But on the joined table, the IDs are now 4 and 4 rather than 1 and 2, and we've correctly accounted for that.

The script relies on the forign_key_list pragma to list foreign keys https://www.sqlite.org/pragma.html#pragma_foreign_key_list E>g.:

sqlite3 in0.sqlite 'pragma foreign_key_list(ref)'

gives:

0|0|s|l|id|NO ACTION|NO ACTION|NONE
1|0|t|k|id|NO ACTION|NO ACTION|NONE

For reference, table_info gives:

sqlite3 in0.sqlite 'pragma table_info(s)'

Gives:

0|k|INTEGER|0||1
1|l|INTEGER|0||2

Tested on Ubuntu 23.04, sqlite 3.40.1.

Upvotes: 3

Mohammadsadegh
Mohammadsadegh

Reputation: 131

Here is a simple python code to either merge two database files or scan a directory to find all database files and merge them all together (by simply inserting all data in other files to the first database file found).Note that this code just attaches the databases with the same schema.

import sqlite3
import os


def merge_databases(db1, db2):
    con3 = sqlite3.connect(db1)

    con3.execute("ATTACH '" + db2 +  "' as dba")

    con3.execute("BEGIN")
    for row in con3.execute("SELECT * FROM dba.sqlite_master WHERE type='table'"):
        combine = "INSERT OR IGNORE INTO "+ row[1] + " SELECT * FROM dba." + row[1]
        print(combine)
        con3.execute(combine)
    con3.commit()
    con3.execute("detach database dba")


def read_files(directory):
    fname = []
    for root,d_names,f_names in os.walk(directory):
        for f in f_names:
            c_name = os.path.join(root, f)
            filename, file_extension = os.path.splitext(c_name)
            if (file_extension == '.sqlitedb'):
                fname.append(c_name)

    return fname

def batch_merge(directory):
    db_files = read_files(directory)
    for db_file in db_files[1:]:
        merge_databases(db_files[0], db_file)

if __name__ == '__main__':
    batch_merge('/directory/to/database/files')

Upvotes: 13

dfrankow
dfrankow

Reputation: 21377

To summarize from the Nabble post in DavidM's answer:

attach 'c:\test\b.db3' as toMerge;           
BEGIN; 
insert into AuditRecords select * from toMerge.AuditRecords; 
COMMIT; 
detach toMerge;

Repeat as needed.

Note: added detach toMerge; as per mike's comment.

Upvotes: 113

Taba
Taba

Reputation: 4316

If you have reached the bottom of this feed and yet didn't find your solution, here is also a way to merge the tables of 2 or more sqlite databases.

First try to download and install DB browser for sqlite database. Then try to open your databases in 2 windows and try merging them by simply drag and drop tables from one to another. But the problem is that you can just drag and drop only one table at a time and therefore its not really a solution for this answer specifically but yet it can used to save some time from further searches if your database is small.

Upvotes: -2

Pedro Lobito
Pedro Lobito

Reputation: 98871

Late answer, but you can use:

#!/usr/bin/python

import sys, sqlite3

class sqlMerge(object):
    """Basic python script to merge data of 2 !!!IDENTICAL!!!! SQL tables"""

    def __init__(self, parent=None):
        super(sqlMerge, self).__init__()

        self.db_a = None
        self.db_b = None

    def loadTables(self, file_a, file_b):
        self.db_a = sqlite3.connect(file_a)
        self.db_b = sqlite3.connect(file_b)

        cursor_a = self.db_a.cursor()
        cursor_a.execute("SELECT name FROM sqlite_master WHERE type='table';")

        table_counter = 0
        print("SQL Tables available: \n===================================================\n")
        for table_item in cursor_a.fetchall():
            current_table = table_item[0]
            table_counter += 1
            print("-> " + current_table)
        print("\n===================================================\n")

        if table_counter == 1:
            table_to_merge = current_table
        else:
            table_to_merge = input("Table to Merge: ")

        return table_to_merge

    def merge(self, table_name):
        cursor_a = self.db_a.cursor()
        cursor_b = self.db_b.cursor()

        new_table_name = table_name + "_new"

        try:
            cursor_a.execute("CREATE TABLE IF NOT EXISTS " + new_table_name + " AS SELECT * FROM " + table_name)
            for row in cursor_b.execute("SELECT * FROM " + table_name):
                print(row)
                cursor_a.execute("INSERT INTO " + new_table_name + " VALUES" + str(row) +";")

            cursor_a.execute("DROP TABLE IF EXISTS " + table_name);
            cursor_a.execute("ALTER TABLE " + new_table_name + " RENAME TO " + table_name);
            self.db_a.commit()

            print("\n\nMerge Successful!\n")

        except sqlite3.OperationalError:
            print("ERROR!: Merge Failed")
            cursor_a.execute("DROP TABLE IF EXISTS " + new_table_name);

        finally:
            self.db_a.close()
            self.db_b.close()

        return

    def main(self):
        print("Please enter name of db file")
        file_name_a = input("File Name A:")
        file_name_b = input("File Name B:")

        table_name = self.loadTables(file_name_a, file_name_b)
        self.merge(table_name)

        return

if __name__ == '__main__':
    app = sqlMerge()
    app.main()

SRC : Tool to merge identical SQLite3 databases

Upvotes: 3

Damilola Olowookere
Damilola Olowookere

Reputation: 2383

Although a very old thread, this is still a relevant question in today's programming needs. I am posting this here because none of the answers provided yet is concise, easy, and straight-to-point. This is for sake of Googlers that end up on this page. GUI we go:

  1. Download Sqlitestudio
  2. Add all your database files by using the Ctrl + O keyboard shortcut
  3. Double-click each now-loaded db file to open/activate/expand them all
  4. Fun part: simply right-click on each of the tables and click on Copy, and then go to the target database in the list of the loaded database files (or create new one if required) and right-click on the target db and click on Paste

I was wowed to realize that such a daunting task can be solved using the ancient programming skill called: copy-and-paste :)

Upvotes: 10

Robert Gould
Robert Gould

Reputation: 69825

With no offense, just as one developer to another, I'm afraid that your idea seems terribly inefficient. It seems to me that instead of uniting SQLite databases you should probably be storing several tables within the same Database file.

However if I'm mistaken I guess you could ATTACH the databases and then use a VIEW to simplify your queries. Or make an in-memory table and copy over all the data (but that's even worse performance wise, especially if you have large databases)

Upvotes: -15

Espo
Espo

Reputation: 41919

If you only need to do this merge operation once (to create a new bigger database), you could create a script/program that will loop all your sqlite databases and then insert the data into your main (big) database.

Upvotes: 1

Related Questions