Reputation: 36070

outer join on same table sql

I am creating a time machine with c#. A time machine is a way of creating a backup of my files in the way where I can access a specific file like it was at a specific time. Anyways the way I am doing so is by looking for all the files inside a directory and I store those files information in a table named table1. So if the first time I scan my computer lets assume I only have 3 files therefore my table will look something like:

ID   FullName   DateModified   DateInsertedToDatabase
 1     C:\A       456588731             0
 2     C:\B       955588762             0
 3     C:\C       854587783             0

lets say that next time I perform a back up I have the same 3 files but I have created a new file and modified file C. As a result my table should now look like:

    ID   FullName   DateModified   DateInsertedToDatabase
     1     C:\A       456588731             0
     2     C:\B       955588762             0
     3     C:\C       854587783             0
     4     C:\A       456588731             1
     5     C:\B       955588762             1
     6     C:\C       111122212             1
     7     C:\X       123212321             1

now I will like to copy file C and File X because those are the files that have been changed or created. How could I build a query where I could obtain file X and file C ? In other words I want to get all the files that have a DateInsertedToDatabase = 1 and that don't match files where DateInsertedToDatabase is less than 1.

if I am not being clear here is the continuation of my example: lets say that I continue with my example and I delete files: B and C, I modify file X, I create a new file Z. My table should look like:

    ID   FullName   DateModified   DateInsertedToDatabase
     1     C:\A       456588731             0
     2     C:\B       955588762             0
     3     C:\C       854587783             0
     4     C:\A       456588731             1
     5     C:\B       955588762             1
     6     C:\C       111122212             1
     7     C:\X       123212321             1
     8     C:\A       456588731             2
     9     C:\X       898989898             2
     10    C:\Z       789564545             2

here I will like to get files X and Z because file X was modified and File Z was created. I will not want to get file A because that file already exist with the same DateModified. How could I build that query?

Upvotes: 0

Answers (4)

Tono Nam

Reputation: 36070

I modified it because I am working with a lot of files therefore the solution works great but not for queries dealing with a lot of records. Here is what I worked out.

lets assume I have this records so far:

enter image description here

Select * from table1 WHERE DateInserted = 4
 and Path not in(
        select Path from table1 t1 
        where 
            DateInserted = 4 AND
            Path IN (Select Path from table1 where DateInserted<4) AND
            DateModified IN (Select DateModified from table1 where DateInserted<4)
    )

and that returns:

enter image description here

this query works out much faster. I will obviously have to change the 4 for a variable in my code but this is just to illustrate the changes that I have done.

Upvotes: 0

pilcrow

Reputation: 58741

Phil Sandler's answer works. This does, too:

    SELECT FullName
      FROM table1
INNER JOIN (SELECT FullName, DateModified
              FROM table1
             WHERE DateInsertedToDatabase = (SELECT MAX(DateInsertedToDatabase) FROM table1)) d
     USING (FullName, DateModified)
  GROUP BY FullName
    HAVING COUNT(1) = 1

Upvotes: 0

jlnorsworthy

Reputation: 3974

I don't know SqlLite, but I hope this will work anyway. It doesn't use anything fancy.

Select t1.* 
From Table1 t1
Left join Table1 t2
On t1.FullName = t2.FullName
And t1.DateInsertedToDatabase = t2.DateInsertedToDatabase + 1
Where t1.DateInsertedToDatabase = (select max(DateInsertedToDatabase) from Table1)
And (t1.DateModified <> t2.DateModified or t2.FullName is null)

Joining on DateInsertedToDatabase + 1 will join with the previous record. Then you filter for the highest DateInsertedToDatabase and include either records that don't have a match (they are new) or where the modified dates don't match.

Upvotes: 0

Phil Sandler

Reputation: 28046

Hmm, I think I understand. You want to get all files that match on the MAX(DateInsertedToDatabase) but don't have a previous row that also matches their DateModified?

You want to do what I call a "reverse inner join." Basically a left join that filters out anything that would have successfully matched in an inner join. There are other ways it could be done as well (e.g. using subqueries).

This is in T-SQL:

CREATE TABLE #mytemp
(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [FullName] [nvarchar](50) NOT NULL,
    DateModified [nvarchar](9) NOT NULL, 
    DateInsertedToDatabase [int] NOT NULL
)

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '0')
INSERT INTO #mytemp VALUES ('C:\B', '955588762', '0')
INSERT INTO #mytemp VALUES ('C:\C', '854587783', '0')

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '1')
INSERT INTO #mytemp VALUES ('C:\B', '955588762', '1')
INSERT INTO #mytemp VALUES ('C:\C', '111122212', '1')
INSERT INTO #mytemp VALUES ('C:\X', '123212321', '1')

INSERT INTO #mytemp VALUES ('C:\A', '456588731', '2')
INSERT INTO #mytemp VALUES ('C:\X', '898989898', '2')
INSERT INTO #mytemp VALUES ('C:\Z', '789564545', '2') 

SELECT 
    temp1.*
FROM 
    #mytemp temp1
    LEFT JOIN #mytemp temp2 ON 
            temp1.ID != temp2.ID --don't match on the same two rows
            AND temp1.FullName = temp2.FullName --match based on full name
            AND temp1.DateModified = temp2.DateModified --and date modified
WHERE
    temp1.DateInsertedToDatabase = (SELECT MAX(DateInsertedToDatabase) FROM #mytemp)
    AND temp2.ID IS NULL --filter out rows that would have matched on an INNER JOIN 

 DROP TABLE #mytemp

Upvotes: 2

outer join on same table sql

Answers (4)

Related Questions