MattSizzle
MattSizzle

Reputation: 3175

Mysql duplicate row deletion with Perl DBI across two tables

This one is a pretty good one IMO and I have not seen a close exampled on SO or Google so here you go. I need to do the following within a Perl application I am building. Unfortunately it can not be done directly in MySQL and will require DBI. In a nutshell I need to take Database1.tableA and locate every record with the column 'status' matching 'started'. This I can do as it is fairly easy (not very good with DBI yet, but have read the docs), but where I am having issues is what I have to do next.

my $started_query = "SELECT primary_ip FROM queue WHERE status='started'";
        my $started = $dbh->prepare($started_query);
        $started->execute();

        while ( my @started = $started->fetchrow_array() ) {
     # Where I am hoping to have the following occur so it can go by row
     # for only rows with the status 'started'
}

So for each record in the @started array, really only contains one value per iteration of the while loop, I need to see if it exists in the Database2.tableA and IF it does exist in the other database (Database2.tableA) I need to delete it from Database1.tableA, but if it DOES NOT exist in the other database (Database2.tableA) I need to update the record in the current database (Database1.tableA).

Basically replicating the below semi-valid MySQL syntax.

DELETE FROM tableA WHERE primary_ip IN (SELECT primary_ip FROM db2.tablea) OR UPDATE tableA SET status = 'error'

I am limited to DBI to connect to the two databases and the logic is escaping me currently. I could do the queries to both databases and store in @arrays and then do a comparison, but that seems redundant as I think it should be possible within the while ( my @started = $started->fetchrow_array() ) as that will save on runtime and resources required. I am also not familiar enough with passing variables between DBI instances and as the @started array will always contain the column value I need to query for and delete I would like to take full advantage of having that defined and passed to the DBI objects.

I am going to be working on this thing all night and already ran through a couple pots of coffee so your helping me understand this logic is greatly appreciated.

Upvotes: 0

Views: 505

Answers (1)

Michael - sqlbot
Michael - sqlbot

Reputation: 179124

You'll be better off with fetchrow_hashref, which returns a hashref of key/value pairs, where the keys are the column names, rather than coding based on columns showing up at ordinal positions in the array.

You need an additional database handle to do the lookups and updates because you've got a live statement handle on the first one. Something like this:

my $dbh2 = DBI->connect(...same credentials...);

...

while(my $row = $started->fetchrow_hashref)
{
    if(my $found = $dbh2->selectrow_hashref("SELECT * FROM db2.t2 WHERE primary_ip = ?",undef,$row->{primary_ip}))
    {
        $dbh2->do("DELETE FROM db1.t1 WHERE primary_ip = ?",undef,$found->{primary_ip});
    }
    else
    {
        $dbh2->do("UPDATE db1.t1 SET status = 'error' WHERE primary_ip = ?",undef,$found->{primary_ip}");
    }

}

Technically, I don't "need" to fetch the row from db2.t2 into my $found since you're only apparently testing for existence, there are other ways, but using it here is a bit of insurance against doing something other than you intended, since it will be undef if we somehow get some bad logic going and that should keep us from making some potential wrong changes.

But approaching a relational database with loop iterations is rarely the best tactic.

This "could" be done directly in MySQL with just a couple of queries.

First, the updates, where t1.status = 'started' and t2.primary_ip has no matching value for t1.primary_ip:

UPDATE db1.t1 a LEFT JOIN db2.t2 b ON b.primary_ip = a.primary_ip
   SET a.status = 'error' 
 WHERE b.primary_ip IS NULL AND a.status = 'started';

If you are thinking "but b.primary_ip is never null" ... well, it is null in a left join where there are no matching rows.

Then deleting the rows from t1 can also be accomplished with a join. Multi-table joins delete only the rows from the table aliases listed between DELETE and FROM. Again, we're calling "t1" by the alias "a" and t2 by the alias "b".

DELETE a
  FROM db1.t1 a JOIN db2.t2 b ON a.primary_ip = b.primary_ip
 WHERE a.status = 'started'; 

This removes every row from t1 ("a") where status = 'started' AND where a matching row exists in t2.

Upvotes: 1

Related Questions