Reputation: 1364
I am trying to create multiple Z Scores [(value-AVG)/STD] within one query and filter them.
So far, I came up with this; before you look at it, however, theres some french words, "quartier" is like a district/borough and "prix" is a price.
SELECT ((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z,
q.myId AS q_myId
FROM immobilier_ad_blank AS t1
LEFT JOIN Adresse AS a ON t1.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
CROSS JOIN (
SELECT AVG(t2.prix) AS HMEAN,
STD(t2.prix) AS STD_SAMPLE,
q.myId AS quartier_myId
FROM immobilier_ad_blank AS t2
LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
GROUP BY quartier_myId
) AS A1 ON q.myId = A1.quartier_myId
It seems to work, however I'm not sure if Im doing this right, I only added one filtering condition but I will add up to 6 conditions and so far this method seems quite cumbersome for what it does.
Ultimately it would look like this
SELECT
((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z1,
((t1.prix-A2.HMEAN)*100/A1.STD_SAMPLE) AS Z2,
((t1.prix-AN.HMEAN)*100/A1.STD_SAMPLE) AS ZN,
All Relevant selects
FROM immobilier_ad_blank AS t1
All Relevant LEFT Joins
CROSS JOIN (
SELECT AVG(t2.prix) AS HMEAN,
STD(t2.prix) AS STD_SAMPLE,
q.myId AS quartier_myId
FROM immobilier_ad_blank AS t2
LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
GROUP BY quartier_myId
) AS A1 ON q.myId = A1.quartier_myId
CROSS JOIN (
SELECT AVG(t2.prix) AS HMEAN,
STD(t2.prix) AS STD_SAMPLE,
q.myId AS quartier_myId
FROM immobilier_ad_blank AS t3
Other Joins
Group By Other conditions
) AS A2 ON OtherConditions
CROSS JOIN (
SELECT AVG(tN.prix) AS HMEAN,
STD(tN.prix) AS STD_SAMPLE,
q.myId AS quartier_myId
FROM immobilier_ad_blank AS tN
Other Joins
Group By Other conditions
) AS AN ON OtherConditions
This is in mysql, according to manual cross join = inner join = join. However from what I understood cross join is table X table, but in my case since it returns aggregates, I suppose its not t^2 but t*(number of aggregates) right ?
AM I right on the right track ? Optimization Wise ? Did I put my filtering at the right place ?
I basically wanna select all rows, find their averages function of their join values, calculate their Z Scores, and add them up within less than 5 sec and with around 1M data.
EDIT 1:
I'll simplify my question,
SELECT
((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z,
((t1.prix-A2.HMEAN)*100/A2.STD_SAMPLE) AS Z2,
q.myId AS q_myId,
c.myId AS c_myId,
s.myId as size_myId
FROM immobilier_ad_blank AS t1
LEFT JOIN Size AS s ON t1.size_id=s.id
LEFT JOIN Adresse AS a ON t1.adresse_id=a.id
LEFT JOIN City AS c ON a.city_id=c.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
CROSS JOIN (
SELECT AVG(t2.prix) AS HMEAN,
STD(t2.prix) AS STD_SAMPLE,
q.myId AS quartier_myId
s.myId as size_myId
FROM immobilier_ad_blank AS t2
LEFT JOIN Size AS s ON t2.size_id=s.id
LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
GROUP BY quartier_myId, size_myId
) AS A1 ON q.myId = A1.quartier_myId
AS A2 on s.myId=A2.size_myId #<--------- Is this line possible ?
EDIT 3:
I ended up using temporary tables and duplicating them because views in my case were way slower than temp tables even though the underlying data was correctly indexed.
CREATE TEMPORARY TABLE IF NOT EXISTS A1 AS (
SELECT AVG(t2.prix) AS HMEAN,
STD(t2.prix) AS STD_SAMPLE,
s.myId AS size_myId,
q.myId AS quartier_myId
FROM immobilier_ad_blank AS t2
LEFT JOIN Size AS s ON t2.size_id=s.id
LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
GROUP BY quartier_myId,size_myId);
CREATE TEMPORARY TABLE A2 LIKE A1;
INSERT A2 SELECT * FROM A1;
SELECT
((c.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z1,
((c.prix-A2.HMEAN)*100/A2.STD_SAMPLE) AS Z2,
q.myId AS quartier_myId,
s.myId AS size_myId
FROM immobilier_ad_blank AS c
LEFT JOIN Size AS s ON c.size_id=s.id
LEFT JOIN Adresse AS ad ON c.adresse_id=ad.id
LEFT JOIN Quartier AS q ON ad.quartier_id=q.id
JOIN A1 on A1.quartier_myId = q.myId
JOIN A2 AS A2 on A2.size_myId = s.myId
Basically, joining the same virtual table (subquery) === Views || Temp Tables And to be honest, I might end up creating a permanent table and update it once a day....
Upvotes: 2
Views: 73
Reputation: 108686
CROSS JOIN ... ON condition
makes the cross join into an ordinary INNER JOIN
or, for brevity, just JOIN
. So don't worry about the combinatorial explosion. A correctly written ON
condition prevents it.In short, forget about CROSS JOIN
. Just write JOIN
.
You can write your JOIN
as something like this
FROM table_a AS a
JOIN table_b AS b ON a.id = b.id
Or, you can express either table_a or table_b, or both, as subqueries. For example, you can write this query.
FROM table_A AS a
JOIN (
SELECT MAX(id) id, district
FROM table_b
GROUP BY district
) AS b on a.id = b.id
In other words, you can use a physical table (table_b
) or a virtual table (a subquery) interchangeably.
In your case, your subquery is a four-column table: (HMEAN, STD_SAMPLE, quartier_myId, and size_myId) .
How will you join that table to the rest of the tables in your query?
You have the code
JOIN (subquery) AS A1 ON q.myId = A1.quartier_myId
But, to use the fourth column of your subquery (size_myId) you need to use it also for the JOIN. To do this, put an AND
in your ON
clause. Do something like this:
JOIN (subquery) AS A1 ON q.myId = A1.quartier_myId
AND s.myId = A.size_myId
If the meaning of s.myId is correct, that should give you a useful result.
Compound ON
clauses are very useful. You can do such things as this:
LEFT JOIN (subquery) AS A1 ON q.myId = A1.quartier_myId
AND s.myId = A.size_myId
AND A.HMEAN > 7.5
to filter your results.
Upvotes: 2