Peter
Peter

Reputation: 427

Explanation of an SQL query

I recently got some help for an oracle query and don't quite understand how it works and thus can't get it to work with my data. Is anyone able to explain the logic of what is happening in logical steps and what variables are actually taken from an existing table's columns? I am looking to select data from a table of readings (column names are: day, hour, volume) and find the average reading of volume for each hour of each day (thus GROUP BY day, hour), by going back to all readings for that hour/day combination in the past (as far back as my dataset goes) and writing out the average for it. Once that is done, it will write the results to a different table with the same column names (day, hour, volume). Except when I write it back on a per hour basis, 'volume' will be the average for that hour of the day in the past. For example, I want to find what the average was for all Wednesdays at 7pm in the past, and output the average to a new record. Assuming these 3 columns were used and in reference to the code below, I am not sure how "hours" differs to "hrs" and what the t1 variable represents. Any help is appreciated.

INSERT INTO avg_table (days, hours, avrg)
   WITH xweek
        AS (SELECT ds, LPAD (hrs, 2, '0') hrs
              FROM (    SELECT LEVEL ds
                          FROM DUAL
                    CONNECT BY LEVEL <= 7),
                   (    SELECT LEVEL - 1 hrs
                          FROM DUAL
                    CONNECT BY LEVEL <= 24))
     SELECT t1.ds, t1.hrs, AVG (volume)
       FROM xweek t1, tables t2
      WHERE     t1.ds = TO_CHAR (t2.day(+), 'D')
            AND t1.hrs = t2.hour(+)
   GROUP BY t1.ds, t1.hrs;

Upvotes: 0

Views: 133

Answers (1)

Ben
Ben

Reputation: 52863

I'd re-write this slightly so it makes more sense (to me at least).

To break it down bit by bit, CONNECT BY is a hierarchical (recursive) query. This is a common "cheat" to generate rows. In this case 7 to represent each day of the week, numbered 1 to 7.

 SELECT LEVEL ds
   FROM DUAL
CONNECT BY LEVEL <= 7

The next one generates the hours 0 to 23 to represent midnight to 11pm. These are then joined together in the old style in a Cartesian or CROSS JOIN. This means that every possible combination of rows is returned, i.e. it generates every hour of every day for a single week.

The WITH clause is described in the documentation on the SELECT statement, it is commonly known as a Common Table Expression (CTE), or in Oracle the Subquery Factoring Clause. This enables you to assign a name to a sub-query and reference that single sub-query in multiple places. It can also be used to keep code clean or generate temporary tables in memory for ready access. It's not required in this case but it does help to separate the code nicely.

Lastly, the + is Oracle's old notation for outer joins. They are mostly equivalent but there are a few very small differences that are described in this question and answer.

As I said at the beginning I would re-write this to conform to the ANSI standard because I find it more readable

insert into avg_table (days, hours, avrg)
with xweek as (
select ds, lpad(hrs, 2, '0') hrs
  from ( select level ds 
           from dual 
        connect by level <= 7 )
 cross join ( select level - 1 hrs 
                from dual 
             connect by level <= 24 )
       )
select t1.ds, t1.hrs, avg(volume)
 from xweek t1
 left outer join tables t2
   on t1.ds = to_char(t2.day, 'd')
  and t1.hrs = t2.hour
group by t1.ds, t1.hrs;

To go into slightly more detail the t1 variable represents an alias for the CTE week1, it's so you don't have to type the entire thing each time. hrs is an alias for the generated expression, as you reference it explicitly you need to call it something. HOURS is a column in your own table.

As to whether this is doing the correct thing I'm not sure, you imply you only want it for a single day rather than the entire week so only you can decide if this is correct? I also find it a little strange that you need the HOURS column in your table to be a character left-padded with 0s lpad(hrs, 2, '0'), once again, only you know if this is correct.

I would highly recommend playing about with this yourself and working out how everything goes together. You also seem to be missing some of the basics, get a text book or look around on the internet, or Stack Overflow, there's plenty of examples.

Upvotes: 1

Related Questions