Vi. Bu.
Vi. Bu.

Reputation: 23

How to compare data from 2 databases with different primary and foreign keys?

I need to compare 2 databases on their datas. But I have to compare only the data, not the keys, which may be different (but the relations between objets must be the same).

Actually it is a need to verify the proper functioning of a data conversion tool.

Simple example:

I use an entity-framework on C# .Net and Sqlite, here's the database schema:

public partial class Document
{
    [PrimaryKey, AutoIncrement]
    public Int64 idDocument { get; set; }
    public string name { get; set; }
}

public partial class Diagram
{
    [PrimaryKey, AutoIncrement]
    public Int64 idDiagram { get; set; }
    public string title { get; set; }
}

public partial class Document_contains_diagram
{
    public Int64 idDocument { get; set; }
    public Int64 idDiagram { get; set; }
}

The database :

Tables in order : Document - Diagram - Document_contains_diagram

idDocument name idDiagram title idDocument idDiagram
1 "myFirstDocument" 1 "evolution" 1 1
2 "mySecondDocument" 2 "futur" 2 2

Should be considered as same as:

idDocument name idDiagram title idDocument idDiagram
3 "myFirstDocument" 8 "evolution" 3 8
4 "mySecondDocument" 6 "futur" 4 6

It's an example, the true database I want to compare contains more than 50 tables and often more than 10000 entries (RAM limit is a contraint too).
So I'm looking for an generic algorythm, or a tool to compare theses 2 databases that might help me in my search. All tools I've seen only check for strict equality of the tables. Maybe converting the database into a graph would works ?

Upvotes: 0

Views: 886

Answers (1)

btilly
btilly

Reputation: 46507

I don't know of an automated tool to do this, but I would recommend building a one-off solution for finding a canonical arrangement of the data. Then comparing canonical versions.

What I mean is this. You'd export all of the data, and import it into SQLite. Then for each table you define a canonical order. You then migrate IDs into canonical ones, then migrate all of the IDs. Repeat.

Much of the structure can be automatically generated. But I'd recommend automatically generating a configuration that you then fix for the weird edge cases that always exist.

In your case the configuration might look something like this:

[
    # table, column, orderBy, foreign references
    # each order by is [column list]
    # each foreign reference is [table, column]
    ["Document", "idDocument", ["name"], [["Document_contains_diagram", "idDocument"]]],
    ["Diagram.idDiagram", ["title"], [["Document_contains_diagram", "idDiagram"]]],
    ...
]

What your tool would do with it is:

For each entry:
    Create a conversion table with a query like:

        create table tmp_convert as (
            select {table}.{id_column} as old_id,
                row_number() over (order by {columns here}) as new_id
            from {table}
        );

    Add an index to tmp_convert.old_id

    Update {table}.{id_column} using tmp_convert

    for each foreign reference:
        Update foreign reference using tmp_convert

    drop tmp_convert

So in your example it would take the first database, and after a bunch of work find that it was in canonical order. It would take the second database, and for idDocument would replace 3 with 1 everywhere, and 4 with 2. For idDiagram it would replace 8 with 1 and 6 with 2. Once it did this, the two databases are now identical, so they would match.

Upvotes: 0

Related Questions