Javatar
Javatar

Reputation: 665

Complex SQL query (suggestions needed)

Could someone throw me ideas about how to resolve the following issue:

I have pictures, videos and tags in my database. Tags can be associated to pictures and videos. I have to query database for tag IDs and count that are modified before 7 days or earlier and sort them by count of associations they have to picture and video assets.

So the idea is that at the end I can output the tags that were associated to pictures or videos the most during last 7 days. I wrote down the DB structure with the fields I have

VIDEO
    ID

PICTURE
    ID

PICTURE_ATTRMAPPING
    CS_OWNERID  (ID of picture)
    CS_ATTRID  ( will store ID of the tag attribute (picture_tag or video_tag))
    ASSETVALUE ( will store ID OF THE associated tag)

VIDEO_ATTRMAPPING
    CS_OWNERID  (ID of video)
    CS_ATTRID  ( will store ID of the tag attribute (picture_tag or video_tag))
    ASSETVALUE ( will store ID OF THE associated tag)

TAG
    ID
    UPDATEDATE

ATTRIBUTES (picture_tag attribute ID is stored here)
    ID
    NAME

So we can see that tags, videos, picture and attributes are stored in separate tables. We can tell that tag is referenced by video or picture (in other words, video/picture has a tag association) if VIDEO/PICTURE_ATTRMAPPING table has record with video or picture tag attribute id in CS_ATTRID column and id of the tag in ASSETVALUE column.

I am assuming it will be query with sub-queries, so I started to break this task in sub-tasks and figure out how to obtain all the information that's needed.

I will definitely have to get the ID of the tag attributes for video and picture objects:

SELECT id FROM ATTRIBUTES WHERE NAME = 'picture_tag' OR NAME = 'video_tag'

Also an example of how I can query for tags that are n days old:

SELECT id FROM TAG WHERE updateddate BETWEEN TO_DATE('2013-08-20 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND CURRENT_DATE

This probably isn't that complex as it seems, however, I am not sure about how the whole query should look like and with what to start. Could someone throw an idea or sample?

I am using oracle, but am familiar with mysql too, so samples from any of the DBMSs would be great. Let me know if I made myself clear enough.

Upvotes: 0

Views: 116

Answers (2)

schurik
schurik

Reputation: 7928

all tag ids assigned to the pictures:

SELECT ASSETVALUE  
            FROM PICTURE_ATTRMAPPING 
                 JOIN ATTRIBUTES 
                 ON (      ATTRIBUTES.ID = PICTURE_ATTRMAPPING.CS_ATTRID 
                      AND  ATTRIBUTES.NAME = 'picture_tag'
                    ) 

all tag ids assigned to the videos:

    SELECT ASSETVALUE  
    FROM   VIDEO_ATTRMAPPING 
           JOIN ATTRIBUTES 
           ON (      ATTRIBUTES.ID = PICTURE_ATTRMAPPING.CS_ATTRID 
                AND  ATTRIBUTES.NAME = 'video_tag'
              )

use OUTER JOIN to count pictures and videos wich are tagged

SELECT 
  TAG.ID
, SUM(CASE WHEN PT.ASSETVALUE is not NULL THEN 1 ELSE 0 END) as tagged_picture_number
, SUM(CASE WHEN VT.ASSETVALUE is not NULL THEN 1 ELSE 0 END) as tagged_vieos_number
  FROM 
    TAG
    LEFT OUTER JOIN 
      (
        SELECT ASSETVALUE  
        FROM PICTURE_ATTRMAPPING 
             JOIN ATTRIBUTES 
             ON ( ATTRIBUTES.ID = PICTURE_ATTRMAPPING.CS_ATTRID AND  ATTRIBUTES.NAME = 'picture_tag') 
      ) PT
      ON ( PT.ASSETVALUE = TAG.ID)
    LEFT OUTER JOIN 
      (
        SELECT ASSETVALUE  
        FROM   VIDEO_ATTRMAPPING 
               JOIN ATTRIBUTES 
               ON ( ATTRIBUTES.ID = PICTURE_ATTRMAPPING.CS_ATTRID AND  ATTRIBUTES.NAME = 'video_tag') 
      ) VT
      ON ( PT.ASSETVALUE = TAG.ID)
WHERE
    TAG.UPDATEDATE <= TRUNC(SYSDATE, 'DD') - 7
group by TAG.ID
order by tagged_picture_number + tagged_vieos_number DESC
;

Upvotes: 0

Ben
Ben

Reputation: 52853

The tags that were updated in the last 7 days can be written as:

select id from tags where updatedate >= sysdate - 7

From what you're saying the ATTRIBUTES table only has two values in it; so you can ignore it. Even if it has more your tables PICTURE_ATTRMAPPING and VIDEO_ATTRMAPPING ensure that the attributes are enforced when you join to them so this doesn't matter either.

If you want a tag to be associated to a picture or a video you just need to enforce that it exists in one of the *_ARRTMAPPING tables:

select *
  from tags t
  left outer join picture_attrmappings pa
    on t.id = pa.assetvalue
  left outer join video_attrmappings va
    on t.id = va.assetvalue
 where t.updatedate >= sysdate - 7
   and ( pa.assetvalue is not null
         or va.assetvalue is not null
         )

You then want the tag IDs that have been modified most; so you need to ORDER BY the count:

select t.id
  from tags t
  left outer join picture_attrmappings pa
    on t.id = pa.assetvalue
  left outer join video_attrmappings va
    on t.id = va.assetvalue
 where t.updatedate >= sysdate - 7
   and ( pa.assetvalue is not null
         or va.assetvalue is not null
         )
 group by t.id
 order by count(*) desc

On a slightly different note this is quite a strange schema (assuming you've left nothing out). I would expect PICTURE_ATTRMAPPINGS to be a junction table between TAGS and PICTURE. The TAGS table should store a unique list of tags but this doesn't seem to be what it does. The UPDATEDATE should then be in either PICTURE_ARRTMAPPINGS as you can then know when each tag was last updated or in PICTURE so you know when all tags were last updated (or both).

I suspect you're missing some of the schema but I see no method, with what you've provided, of using either PICTURE or VIDEO.

Upvotes: 1

Related Questions