Reputation: 78
I'm reverse engineering the relationships between a medium-sized number of tables (50+) in an Oracle database where there are no foreign keys defined between the tables. I can count (somewhat) on being able to match column names across tables. For example, column name "SomeDescriptiveName" is probably the same across the set of tables.
What I would like to be able to do is to find a better way of extracting some set of relationships based on those matching column names than manually going through the tables one by one. I could do something with Java DatabaseMetaData methods but it seems like this is one of those tasks that someone has probably had to script before. Maybe extract the columns names with Perl or some other scripting lang, use the column names as a hash key and add tables to an array pointed to by the hash key?
Anyone have any tips or suggestions that might make this simpler or provide a good starting point? It's an ugly need, if foreign keys had already been defined, understanding the relationships would have been much easier.
Thanks.
Upvotes: 1
Views: 2104
Reputation: 66661
You can use a combination of three (or four) approaches, depending on how obfuscated the schema is:
Upvotes: 1
Reputation: 1679
This is an interesting question. The approach I took was a brute force search for columns that matched types and values for a small sample set. You'll probably have to tweak the heuristics to provide good results for your schema. I ran this on a schema that didn't use auto-incremented keys and it worked well. The code is written for MySQL, but it's very easy to adapt to Oracle.
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect("dbi:mysql:host=localhost;database=SCHEMA", "USER", "PASS");
my @list;
foreach my $table (show_tables()) {
foreach my $column (show_columns($table)) {
push @list, { table => $table, column => $column };
}
}
foreach my $m (@list) {
my @match;
foreach my $f (@list) {
if (($m->{table} ne $f->{table}) &&
($m->{column}{type} eq $f->{column}{type}) &&
(samples_found($m->{table}, $m->{column}{name}, $f->{column}{samples})))
{
# For better confidence, add other heuristics such as
# joining the tables and verifying that every value
# appears in the master. Also it may be useful to exclude
# columns in large tables without an index although that
# heuristic may fail for composite keys.
#
# Heuristics such as columns having the same name are too
# brittle for many of the schemas I've worked with. It may
# be too much to even require identical types.
push @match, "$f->{table}.$f->{column}{name}";
}
}
if (@match) {
print "$m->{table}.$m->{column}{name} $m->{column}{type} <-- @match\n";
}
}
$dbh->disconnect();
exit;
sub show_tables {
my $result = query("show tables");
return ($result) ? @$result : ();
}
sub show_columns {
my ($table) = @_;
my $result = query("desc $table");
my @columns;
if ($result) {
@columns = map {
{ name => $_->[0],
type => $_->[1],
samples => query("select distinct $_->[0] from $table limit 10") }
} @$result;
}
return @columns;
}
sub samples_found {
my ($table, $column, $samples) = @_;
foreach my $v (@$samples) {
my $result = query("select count(1) from $table where $column=?", $v);
if (!$result || $result->[0] == 0) {
return 0;
}
}
return 1;
}
sub query {
my ($sql, @binding) = @_;
my $result = $dbh->selectall_arrayref($sql, undef, @binding);
if ($result && $result->[0] && @{$result->[0]} == 1) {
foreach my $row (@$result) {
$row = $row->[0];
}
}
return $result;
}
Upvotes: 0
Reputation: 562260
My strategy would be to use the Oracle system catalog to find columns that are the same in column name and data type but different in table name. Also which one of the columns is part of a table's primary or unique key.
Here's a query that may be close to doing this, but I don't have an Oracle instance handy to test it:
SELECT col1.table_name || '.' || col1.column_name || ' -> '
|| col2.table_name || '.' || col2.column_name
FROM all_tab_columns col1
JOIN all_tab_columns col2
ON (col1.column_name = col2.column_name
AND col1.data_type = col2.data_type)
JOIN all_cons_columns cc
ON (col2.table_name = cc.table_name
AND col2.column_name = cc.column_name)
JOIN all_constraints con
ON (cc.constraint_name = con.constraint_name
AND cc.table_name = con.table_name
AND con.constraint_type IN ('P', 'U')
WHERE col1.table_name != col2.table_name;
Of course this won't get any case of columns that are related but have different names.
Upvotes: 1
Reputation: 124277
You pretty much wrote the answer in your question.
my %column_tables;
foreach my $table (@tables) {
foreach my $column ($table->columns) {
push @{$column_tables[$column]}, $table;
}
}
print "Likely foreign key relationships:\n";
foreach my $column (keys %column_tables) {
my @tables = @{$column_tables[$column]};
next
if @tables < 2;
print $column, ': ';
foreach my $table (@tables) {
print $table->name, ' ';
}
print "\n";
}
Upvotes: 1