Reputation: 10071
I am wondering how 4 queries I use can be consolidated into one query. I was able to query for all days represented in my dataset and the number of clicks for each specific day. Ideally I would be able to get the number of installs (for each day), total cost (for each day) and notes (for each day) from that query as well.
I have thought about how to achieve this, but have not come up with any complete solutions as I am a novice with MySQL. Is there a good way to pull data from multiple tables?
$days = mysqli_query($link, "
SELECT
t1.date,
t1.clicks
FROM (SELECT
date_format(date_sub(ic.click_utc, INTERVAL 7 HOUR), '%Y-%m-%d') as date,
count(distinct ic.txid) as clicks
FROM users_clicks ic
GROUP BY date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%Y-%m-%d')
ORDER BY date DESC) t1
WHERE date >= '2014-06-28'");
while ($day = mysqli_fetch_assoc($days)) {
$date = $day['date'];
$day_clicks = $day['clicks'];
$day_installs = mysqli_fetch_row(mysqli_query($link, sprintf("SELECT count(txid) FROM (SELECT txid FROM users_installs WHERE date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' GROUP BY txid) table1", $day['date'])));
$day_cost = mysqli_fetch_row(mysqli_query($link, sprintf("SELECT sum(earnings) FROM (SELECT max(cost) as earnings FROM users_clicks WHERE date_format(date_sub(click_utc, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' GROUP BY txid) table1", $day['date'])));
$note = mysqli_fetch_assoc(mysqli_query($link, sprintf("SELECT * FROM reporting_notes WHERE date_format(date_sub(timestamp, INTERVAL 7 HOUR), '%%Y-%%m-%%d') = '%s' LIMIT 1", $day['date'])));
}
Here are the structures of the tables:
Users_clicks:
txid | click_utc | cost
12t3 | 2014-08-19 07:08:47 | 0.50
27a5 | 2014-08-18 03:28:03 | 0.25
48a5 | 2014-08-17 12:55:23 | 0.25
Users_installs:
txid | click_utc
1o23 | 2014-08-19 07:08:47
1ee3 | 2014-08-17 11:10:53
Reporting_notes:
timestamp | note
2014-08-19 07:08:47 | "hey"
2014-08-17 03:02:41 | "hey"
desired output: number of clicks, number of installs, sum cost and any notes for each individual day that is in the dataset
Upvotes: 2
Views: 133
Reputation: 4465
First off: MySQL is able to handle named time-zones, but you need to populate the information tables on time-zones first. See to documentation for CONVERT_TZ
as a starting point.
As your tables seem to be unrelated (i.e. there is no master-table containing all txid
s, ...) you will probably be best off using sub-queries which are close to firing separate queries in the beginning.
SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS clicks,
SUM(cost) AS costs
FROM users_clicks
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'));
Only counting each txid once
SELECT
DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS clicks,
SUM(max_costs) AS costs
FROM (SELECT
txid,
MIN(click_utc) AS min_click_utc,
MAX(cost) AS max_costs
FROM users_clicks
GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))
SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS installs
FROM users_installs
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'));
Only counting each txid once
SELECT
DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS installs
FROM (SELECT
txid,
MIN(click_utc) AS min_click_utc
FROM users_installs
GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'));
(note that GROUP_CONCAT
will by default only return 1024 characters. This can be changed by setting group_concat_max_len to a higher value):
SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
COUNT(*) AS note_count,
GROUP_CONCAT(note SEPARATOR ', ') AS notes
FROM reporting_notes
GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'));
If you want to get everything at once you need to first get a common date-pool As MySQL only has LEFT
or RIGHT OUTER JOIN
s, but not FULL OUTER JOIN
s:
SELECT DISTINCT
date
FROM (
SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_clicks
UNION SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_installs
UNION SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
FROM reporting_notes) dates;
and then you can combine it all into
SELECT
dates.date,
clicks.clicks,
clicks.costs,
installs.installs,
notes.note_count,
notes.notes
FROM (SELECT DISTINCT
date
FROM (
SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_clicks
UNION SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_installs
UNION SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
FROM reporting_notes) data) dates
LEFT JOIN (SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS clicks,
SUM(cost) AS costs
FROM users_clicks
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'))) clicks ON clicks.date = dates.date
LEFT JOIN (SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS installs
FROM users_installs
GROUP BY DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00'))) installs ON installs.date = dates.date
LEFT JOIN (SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
COUNT(*) AS note_count,
GROUP_CONCAT(note SEPARATOR ', ') AS notes
FROM reporting_notes
GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'))) notes ON notes.date = dates.date;
The query gets easier if you know, that on each day where an install occurred, there also is a click as you can then use the click-query also as date-pool. In any case I would probably run a separate query to get the notes for each day to be able to act upon the text.
And this query will ensure that each txid in users_installs and users_clicks is only counted once, on the earliest date it occurs. As there are two additional sub-sub-queries performance will be (at least) slightly worse. i would advice against this, if txids are already distinct. Performance can be increased, if you start filtering dates in the sub-queries.
SELECT
dates.date,
clicks.clicks,
clicks.costs,
installs.installs,
notes.note_count,
notes.notes
FROM (SELECT DISTINCT
date
FROM (
SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_clicks
UNION SELECT
DATE(CONVERT_TZ(click_utc, '+00:00', '-07:00')) AS date
FROM users_installs
UNION SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date
FROM reporting_notes) data) dates
LEFT JOIN (SELECT
DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS clicks,
SUM(max_costs) AS costs
FROM (SELECT
txid,
MIN(click_utc) AS min_click_utc,
MAX(cost) AS max_costs
FROM users_clicks
GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))) clicks ON clicks.date = dates.date
LEFT JOIN (SELECT
DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00')) AS date,
COUNT(*) AS installs
FROM (SELECT
txid,
MIN(click_utc) AS min_click_utc
FROM users_installs
GROUP BY txid) distinct_txids
GROUP BY DATE(CONVERT_TZ(min_click_utc, '+00:00', '-07:00'))) installs ON installs.date = dates.date
LEFT JOIN (SELECT
DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00')) AS date,
COUNT(*) AS note_count,
GROUP_CONCAT(note SEPARATOR ', ') AS notes
FROM reporting_notes
GROUP BY DATE(CONVERT_TZ(timestamp, '+00:00', '-07:00'))) notes ON notes.date = dates.date;
Upvotes: 2
Reputation: 21533
Think this will require joining against sub queries.
One sub query to get the various dates used, then join that against a sub query to get the count and sub of clicks, and another to get the count of installs:-
SELECT sub0.aDate, click_count, click_cost, install_count, GROUP_CONCAT(reporting_notes.note)
FROM
(
SELECT DATE(click_utc) AS aDate
FROM users_clicks
UNION
SELECT DATE(click_utc)
FROM users_installs
UNION
SELECT DATE(timestamp)
FROM reporting_notes
) sub0
LEFT OUTER JOIN
(
SELECT DATE(click_utc) AS aDate, COUNT(txid) AS click_count, SUM(cost) AS click_cost
FROM users_clicks
GROUP BY aDate
) sub1
ON sub0.aDate = sub1.aDate
LEFT OUTER JOIN
(
SELECT DATE(click_utc) AS aDate, COUNT(txid) AS install_count
FROM users_installs
GROUP BY aDate
) sub2
ON sub0.aDate = sub2.aDate
LEFT OUTER JOIN reporting_notes
ON sub0.aDate = DATE(reporting_notes.timestamp)
GROUP BY sub0.aDate, click_count, click_cost, install_count
This could be done without the sub queries to do the counts if you only wanted unique notes and the sum wasn't required.
Upvotes: 1