conbask
conbask

Reputation: 10071

Consolidating MySQL queries

I am wondering how 4 queries I use can be consolidated into one query. I was able to query for all days represented in my dataset and the number of clicks for each specific day. Ideally I would be able to get the number of installs (for each day), total cost (for each day) and notes (for each day) from that query as well.

I have thought about how to achieve this, but have not come up with any complete solutions as I am a novice with MySQL. Is there a good way to pull data from multiple tables?

$days = mysqli_query($link, "
    SELECT
      t1.date,
      t1.clicks
    FROM (SELECT
        date_format(date_sub(ic.click_utc, INTERVAL 7 HOUR), '%Y-%m-%d') as date,
        count(distinct ic.txid) as clicks
          FROM users_clicks ic
          GROUP BY date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%Y-%m-%d')
          ORDER BY date DESC) t1
    WHERE date >= '2014-06-28'");
while ($day = mysqli_fetch_assoc($days)) {
    $date = $day['date'];
    $day_clicks = $day['clicks'];
    $day_installs = mysqli_fetch_row(mysqli_query($link, sprintf("SELECT count(txid) FROM (SELECT txid FROM users_installs WHERE date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' GROUP BY txid) table1", $day['date'])));
    $day_cost = mysqli_fetch_row(mysqli_query($link, sprintf("SELECT sum(earnings) FROM (SELECT max(cost) as earnings FROM users_clicks WHERE date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' GROUP BY txid) table1", $day['date'])));
    $note = mysqli_fetch_assoc(mysqli_query($link, sprintf("SELECT * FROM reporting_notes WHERE date_format(date_sub(timestamp, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' LIMIT 1", $day['date'])));
}

Here are the structures of the tables:

Users_clicks:

   txid  |       click_utc        |  cost  
   12t3  |   2014-08-19 07:08:47  |  0.50
   27a5  |   2014-08-18 03:28:03  |  0.25
   48a5  |   2014-08-17 12:55:23  |  0.25

Users_installs:

   txid  |       click_utc 
   1o23  |   2014-08-19 07:08:47
   1ee3  |   2014-08-17 11:10:53

Reporting_notes:

      timestamp        |  note  
  2014-08-19 07:08:47  |  "hey"
  2014-08-17 03:02:41  |  "hey"

desired output: number of clicks, number of installs, sum cost and any notes for each individual day that is in the dataset

Upvotes: 2

Views: 133

Answers (2)

TheConstructor
TheConstructor

Reputation: 4465

First off: MySQL is able to handle named time-zones, but you need to populate the information tables on time-zones first. See to documentation for CONVERT_TZ as a starting point.

As your tables seem to be unrelated (i.e. there is no master-table containing all txids, ...) you will probably be best off using sub-queries which are close to firing separate queries in the beginning.

Clicks and cost per day:

SELECT
  DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
  COUNT(*)                                        AS clicks,
  SUM(cost)                                       AS costs
FROM users_clicks
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'));
Only counting each txid once
SELECT
  DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
  COUNT(*)                                            AS clicks,
  SUM(max_costs)                                      AS costs
FROM (SELECT
        txid,
        MIN(click_utc) AS min_click_utc,
        MAX(cost)      AS max_costs
      FROM users_clicks
      GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))

Installs per day:

SELECT
  DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
  COUNT(*)                                        AS installs
FROM users_installs
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'));
Only counting each txid once
SELECT
  DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
  COUNT(*)                                            AS installs
FROM (SELECT
        txid,
        MIN(click_utc) AS min_click_utc
      FROM users_installs
      GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'));

Notes per day

(note that GROUP_CONCAT will by default only return 1024 characters. This can be changed by setting group_concat_max_len to a higher value):

SELECT
  DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
  COUNT(*)                                        AS note_count,
  GROUP_CONCAT(note SEPARATOR ', ')               AS notes
FROM reporting_notes
GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'));

If you want to get everything at once you need to first get a common date-pool As MySQL only has LEFT or RIGHT OUTER JOINs, but not FULL OUTER JOINs:

SELECT DISTINCT
  date
FROM (
       SELECT
         DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
       FROM users_clicks
       UNION SELECT
               DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
             FROM users_installs
       UNION SELECT
               DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
             FROM reporting_notes) dates;

and then you can combine it all into

one query

SELECT
  dates.date,
  clicks.clicks,
  clicks.costs,
  installs.installs,
  notes.note_count,
  notes.notes
FROM (SELECT DISTINCT
        date
      FROM (
             SELECT
               DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
             FROM users_clicks
             UNION SELECT
                     DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
                   FROM users_installs
             UNION SELECT
                     DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
                   FROM reporting_notes) data) dates
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
               COUNT(*)                                        AS clicks,
               SUM(cost)                                       AS costs
             FROM users_clicks
             GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'))) clicks ON clicks.date = dates.date
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
               COUNT(*)                                        AS installs
             FROM users_installs
             GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'))) installs ON installs.date = dates.date
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
               COUNT(*)                                        AS note_count,
               GROUP_CONCAT(note SEPARATOR ', ')               AS notes
             FROM reporting_notes
             GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'))) notes ON notes.date = dates.date;

The query gets easier if you know, that on each day where an install occurred, there also is a click as you can then use the click-query also as date-pool. In any case I would probably run a separate query to get the notes for each day to be able to act upon the text.

Count txids only once

And this query will ensure that each txid in users_installs and users_clicks is only counted once, on the earliest date it occurs. As there are two additional sub-sub-queries performance will be (at least) slightly worse. i would advice against this, if txids are already distinct. Performance can be increased, if you start filtering dates in the sub-queries.

SELECT
  dates.date,
  clicks.clicks,
  clicks.costs,
  installs.installs,
  notes.note_count,
  notes.notes
FROM (SELECT DISTINCT
        date
      FROM (
             SELECT
               DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
             FROM users_clicks
             UNION SELECT
                     DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
                   FROM users_installs
             UNION SELECT
                     DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
                   FROM reporting_notes) data) dates
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
               COUNT(*)                                            AS clicks,
               SUM(max_costs)                                      AS costs
             FROM (SELECT
                     txid,
                     MIN(click_utc) AS min_click_utc,
                     MAX(cost)      AS max_costs
                   FROM users_clicks
                   GROUP BY txid) distinct_txids
             GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))) clicks ON clicks.date = dates.date
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
               COUNT(*)                                            AS installs
             FROM (SELECT
                     txid,
                     MIN(click_utc) AS min_click_utc
                   FROM users_installs
                   GROUP BY txid) distinct_txids
             GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))) installs ON installs.date = dates.date
  LEFT JOIN (SELECT
               DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
               COUNT(*)                                        AS note_count,
               GROUP_CONCAT(note SEPARATOR ', ')               AS notes
             FROM reporting_notes
             GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'))) notes ON notes.date = dates.date;

Upvotes: 2

Kickstart
Kickstart

Reputation: 21533

Think this will require joining against sub queries.

One sub query to get the various dates used, then join that against a sub query to get the count and sub of clicks, and another to get the count of installs:-

SELECT sub0.aDate, click_count, click_cost, install_count, GROUP_CONCAT(reporting_notes.note)
FROM
(
    SELECT DATE(click_utc) AS aDate
    FROM users_clicks
    UNION
    SELECT DATE(click_utc)
    FROM users_installs
    UNION
    SELECT DATE(timestamp)
    FROM reporting_notes
) sub0
LEFT OUTER JOIN
(
    SELECT DATE(click_utc) AS aDate, COUNT(txid) AS click_count, SUM(cost) AS click_cost
    FROM users_clicks
    GROUP BY aDate
) sub1
ON sub0.aDate = sub1.aDate
LEFT OUTER JOIN
(
    SELECT DATE(click_utc) AS aDate, COUNT(txid) AS install_count
    FROM users_installs
    GROUP BY aDate
) sub2
ON sub0.aDate = sub2.aDate
LEFT OUTER JOIN reporting_notes
ON sub0.aDate = DATE(reporting_notes.timestamp)
GROUP BY sub0.aDate, click_count, click_cost, install_count

This could be done without the sub queries to do the counts if you only wanted unique notes and the sum wasn't required.

Upvotes: 1

Related Questions