Reputation: 2961
I have got 5 tables of which the structures are the same. Only the PAGEVISITS
field is unique
ie. table 1:
ITEM | PAGEVISITS | Commodity
1813 50 Griddle
1851 10 Griddle
11875 100 Refrigerator
2255 25 Refrigerator
ie. table 2:
ITEM | PAGEVISITS | Commodity
1813 0 Griddle
1851 10 Griddle
11875 25 Refrigerator
2255 10 Refrigerator
I want it to add up the Commodity
to spit out:
table1 | table2 | Commodity
60 10 Griddle
125 35 Refrigerator
Some of the data is actually correct but some are WAY off given the below query:
SELECT
SUM(MT.PAGEVISITS) as table1,
SUM(CT1.PAGEVISITS) as table2,
SUM(CT2.PAGEVISITS) as table3,
SUM(CT3.PAGEVISITS) as table4,
SUM(CT4.PAGEVISITS) as table5,
(COUNT(DISTINCT MT.ITEM)) + (COUNT(DISTINCT CT1.ITEM)) + (COUNT(DISTINCT CT2.ITEM)) + (COUNT(DISTINCT CT3.ITEM)) + (COUNT(DISTINCT CT4.ITEM)) as Total,
MT.Commodity
FROM table1 as MT
LEFT JOIN table2 CT1
on MT.ITEM = CT1.ITEM
LEFT JOIN table3 CT2
on MT.ITEM = CT2.ITEM
LEFT JOIN table4 CT3
on MT.ITEM = CT3.ITEM
LEFT JOIN table5 CT4
on MT.ITEM = CT4.ITEM
GROUP BY Commodity
I believe this may be cause by using the LEFT JOIN
incorrectly. I have also tried the INNER JOIN
with the same inconsistent results.
Upvotes: 0
Views: 207
Reputation: 108380
I would do a UNION on all five of those tables to get them as one rowset (inline view), and then run a query on that, start with something like this...
SELECT SUM(IF(t.source='MT',t.pagevisits,0)) AS table1
, SUM(IF(t.source='CT1',t.pagevisits,0)) AS table2
, t.commodity
FROM ( SELECT 'MT' as source, table1.* FROM table1
UNION ALL
SELECT 'CT1', table2.* FROM table2
UNION ALL
SELECT 'CT2', table3.* FROM table3
UNION ALL
SELECT 'CT3', table4.* FROM table4
UNION ALL
SELECT 'CT4', table5.* FROM table5
) t
GROUP BY t.commodity
(But I would specify the column list for each of those tables, rather than using the '.*' and having my query dependent on no one adding/dropping/renaming/reordering columns in any of those tables.)
I include an "extra" literal value (aliased as "source") to identify which table the row came from. I can use a conditional test in an expression in the SELECT list, to figure out whether the row came from a particular table.
This approach is particularly flexible, and can be used to get more complicated resultsets. For example, if I also wanted to get a total number page visits from table3, 4 and 5 added together, along with the individual counts.
SUM(IF(t.source IN ('CT2','CT3','CT4'),t.pagevisits,0) AS total_345
To get the equivalent of your COUNT(DISTINCT item) + COUNT(DISTINCT item) + ...
expression...
I would use an expression that makes a single value from both the "source" and "item" columns, being careful to have some sort of guarantee that any particular "source"+"item" will not create a duplicate of some other "source"+"item". (If we just concatenate strings, for example, we don't have any way to distinguish between 'A'+'11' and 'A1'+'1'.) The most common approach I see here is a carefully chosen delimiter which is guaranteed not to appear in either value. We can distinguish between 'A::11' and 'A1::1', so something like this will work:
COUNT(DISINCT CONCAT(t.source,'::',t.item))
In your current query, if item
is NULL, then the row doesn't get included in the COUNT. To fully replicate that behavior, you would need something like this:
COUNT(DISINCT IF(t.item IS NOT NULL,CONCAT(t.source,'::',t.item),NULL)) AS Total
Or course, getting a count of distinct item values over the whole set of five tables is much simpler (but then, it does return a different result)
COUNT(DISINCT t.item)
But to answer your question about the use of the LEFT JOIN
, the left side table is the "driver" so a matching row has to be in that table for a corresponding row to be retrieved from a table on the right. That is, unmatched rows from the tables on the right side will not be returned.
If what you have is basically five "partitions", and you want to process all of the rows whether or not a matching row appears in any of the other "partitions", I would go with the UNION ALL
approach to simply concatenate all of the rows from all of those tables together, and process the rows as if they were from a single table.
NOTE: For very large tables, this may not be a feasible approach, since MySQL is going to have to materialize that inline view. There are other approaches which don't require concatenating all of the rows together.
Specifying a list of only the columns you need, in the SELECT from each table, may help performance, if there are columns in those tables you don't need to reference in your query.
Upvotes: 2