Tim M
Tim M

Reputation: 306

Correct query to get average from top 5 of 7 days?

I'm tracking number of steps/day. I want to get the average steps/day using the 5 best days out of a 7 day period. My end goal is going to be to get an average for the best 5 out of 7 days for a total of 16 weeks.

Here's my sqlfiddle - http://sqlfiddle.com/#!9/5e69bdf/2

Here is the query I'm currently using but I've discovered the result is not correct. It's taking the average of 7 days instead of selecting the 5 days that had the most steps. It's outputting 14,122 as an average instead of 11,606 based on my data as posted in the sqlfiddle.

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
      JOIN Courses
      WHERE activities.encodedid=? AND activities.activitydate BETWEEN
          DATE_ADD(Courses.Startsemester, INTERVAL $y DAY) AND 
          DATE_ADD(Courses.Startsemester, INTERVAL $x DAY) 
      ORDER BY activities.steps DESC LIMIT 5
      ) a
GROUP BY a.encodedid

Here's the same query with the values filled in for testing:

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
FROM (SELECT * FROM activities
      JOIN Courses
      WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
          DATE_ADD(Courses.Startsemester, INTERVAL 0 DAY) AND 
          DATE_ADD(Courses.Startsemester, INTERVAL 6 DAY) 
      ORDER BY activities.steps DESC LIMIT 5
      ) a
GROUP BY a.encodedid

Upvotes: 1

Views: 1151

Answers (3)

Nick
Nick

Reputation: 147166

As @SloanThrasher pointed out, the reason the query is not working is because you have multiple rows for the same course in the Courses database which end up being joined to the activities database. Thus the output for the subquery gives the top value (16058) 3 times plus the second highest value (11218) twice for a total of 70610 and an average of 14122. You can work around this by modifying the query as follows:

SELECT SUM(a.steps) as StepsTotal, AVG(a.steps) AS AVGSteps
    FROM (SELECT * FROM activities
          JOIN (SELECT DISTINCT Startsemester FROM Courses) c
          WHERE activities.encodedid='42XPC3' AND activities.activitydate BETWEEN
              DATE_ADD(c.Startsemester, INTERVAL 0 DAY) AND 
              DATE_ADD(c.Startsemester, INTERVAL 6 DAY) 
          ORDER BY CAST(activities.steps AS UNSIGNED) DESC LIMIT 5
          ) a
    GROUP BY a.encodedid

Now since there are actually only 3 days with activity (2018-07-16, 2018-07-17 and 2018-07-18) between the start of semester and 6 days later (2018-07-12 and 2018-07-18) this gives a total of 37533 (16058+11218+10277) and an average of 12517.7.

StepsTotal  AVGSteps
37553       12517.666666666666

Ideally, you probably also want to add a constraint on the Course chosen from Courses e.g. change

(SELECT DISTINCT Startsemester FROM Courses)

to

(SELECT DISTINCT Startsemester FROM Courses WHERE CourseNumber='PHED1164')

Upvotes: 2

Strawberry
Strawberry

Reputation: 33945

Consider the following:

 DROP TABLE IF EXISTS my_table;

CREATE TABLE `my_table` 
(id SERIAL PRIMARY KEY
,steps INT NOT NULL
);

insert into my_table (steps) values
(9),(5),(7),(7),(7),(8),(4);
select prev
    , sum(steps) total
    from (
      select steps
    , case when @prev = grp 
           then @j:=@j+1 else @j:=1 end j
    , @prev:=grp prev
    from (SELECT steps 
               , case when mod(@i,3)=0 
                      then @grp := @grp+1 else @grp:=@grp end grp -- a 3 day week
               , @i:=@i+1 i
            from my_table
               , (select @i:=0,@grp:=0) vars
           order 
              by id) x
, (select @prev:= null, @j:=0) vars
order by grp,steps desc,i) a
where j <=2 -- top 2 (out of 3)
group by prev;

+------+-------+
| prev | total |
+------+-------+
| 1    |    16 |
| 2    |    15 |
| 3    |     4 |
+------+-------+

http://sqlfiddle.com/#!9/ee46d7/11

Upvotes: 0

Michał Turczyn
Michał Turczyn

Reputation: 37367

Try this query:

SELECT @rn := 1, @weekAndYear := 0;

SELECT weekDayAndYear,
       SUM(steps),
       AVG(steps)
FROM (
  SELECT @weekAndYear weekAndYearLag,
         CASE WHEN @weekAndYear = YEAR(activitydate) * 100 + WEEK(activitydate)
           THEN @rn := @rn + 1 ELSE @rn := 1 END rn,
         @weekAndYear := YEAR(activitydate) * 100 + WEEK(activitydate) weekDayAndYear,
         steps,
         lightly_act_min,   
         fairly_act_min,
         sed_act_min,
         vact_min,
         encodedid,
         activitydate,
         username
  FROM activities
  ORDER BY YEAR(activitydate) * 100 + WEEK(activitydate), CAST(steps AS UNSIGNED) DESC
) a WHERE rn <= 5
GROUP BY weekDayAndYear

Demo

With additional variables, I imitate SQL Server ROW_NUMBER function, to number from 1 to 7 days partitioned by weeks. This way I can filter best 5 days and easily get a average grouping by column weekAndDate, which is in the same format as variable: yyyyww (i used integer to avoid casting to varchar).

Upvotes: 1

Related Questions