Reputation: 513
I have a query I'm writing that needs to run on both SQL Server and MySQL that involves performing a very lengthy calculation. If I write it all in one query, it involves a ton of copy and pasting and redundant, hard-to-maintain code. Therefore, I wrote a much more compact version that essentially looks like this:
SELECT calc3...
FROM
(
SELECT calc2...
FROM
(
SELECT calc1...
FROM original_table
)
)
Are there any reasons to believe this will lead to poor performance? e.g. will each select cause SQL Server or MySQL to create an additional temporary table behind the scenes, thereby tripling the execution time?
The calculations don't involve querying any additional tables, just adding up column values, mapping integers to strings, etc.
I only have access to a small testing dataset, so I can't run it and time it, since it takes less than a second to run with the small dataset. And I don't have permissions to view the execution plan. So I'm not sure how to go about verifying that this won't be a disaster, performance-wise. (I can test it on copies of the productive databases, but it's a long and painful process)
EDIT: just to be clear, I will be testing this on copies of the production databases, I'm just hoping to see if there are any obvious pitfalls here before I do so, as doing so will be a time intensive process. e.g. if I were calling ALTER TABLE a bunch of times, someone might point out that ALTER TABLE can be extremely slow for MySQL, so combining it all into one call or avoiding it altogether would be advisable
EDIT2:
The table in question looks like this:
pid Name
1 Value1
1 Value2
1 Value4
2 Value2
2 Value5
3
4 Value1
4 Value1
4 Value4
and it needs to be transformed into a table like this:
pid Output
1 'Many values'
2 'Two and five'
3 'Missing'
4 'Values 1 and 4'
where the values in the Output column are based on what values a given pid has in the Name column in the first table. Unfortunately it is not possible to derive these values by concatenating Name values or anything like that, they are fairly idiosyncratic and all need to be defined individually.
My code, with some simplifications, looks like this:
SELECT pid,
CASE
WHEN TotalValues > 3 THEN 'Many values'
WHEN TotalValues = 2 AND Value2 = 5 AND Value5 = 1 THEN 'Two and five'
WHEN TotalValues = 2 AND Value1 = 1 AND Value4 = 1 THEN 'Values 1 and 4'
WHEN TotalValues = 1 AND Value1 = 1 THEN 'Only value 1'
When TotalValues = 0 THEN 'Missing'
-- About a dozen more of these combinations
END as OutputValue
FROM
(
SELECT *, Value1 + Value7 as TotalValues
FROM
(
SELECT distinct p.pId
COALESCE(MAX(CASE WHEN rc.Name = 'Value1' THEN 1 ELSE 0 END),0) as Value1,
-- more of these that I'm omitting for concision
COALESCE(MAX(CASE WHEN rc.Name = 'Value7' THEN 1 ELSE 0 END),0) as Value7,
FROM primary_table AS p
LEFT JOIN rc on rc.code=p.code
WHERE p.desiredRecords = TRUE
GROUP BY p.pid
) t1
) t2
EDIT3:
Result of running MySQL query profiler in SQLYog
Upvotes: 0
Views: 121
Reputation: 1269753
MySQL has a tendency to materialize subqueries. With the more recent versions, it does not always do this. But there is a general tendency to avoid unnecessary subqueries for this reason.
SQL Server and other databases are much more sophisticated in their optimization techniques. I am not aware of anything in SQL Server that spurs the materialization of subqueries. If you look at the execution plan for a complex SQL query, you will be hard-pressed to figure out where the subqueries are. So, in most databases, I would not worry about it.
CTEs are a different matter -- you don't ask about them, but they are a natural follow-on. Some databases never materialize CTEs. I think some always materialize CTEs. And some sometimes materialize them. For that, you need to be sensitive to the database and version that you are using.
Upvotes: 1
Reputation: 89071
I only have access to a small testing dataset, so I can't run it and time it, since it takes less than a second to run with the small dataset. And I don't have permissions to view the execution plan. So I'm not sure how to go about verifying that this won't be a disaster, performance-wise. (I can test it on copies of the productive databases, but it's a long and painful process)
That's the problem you need to fix. Strangers' speculation on the performance of imaginary queries on multiple database engines is nothing to go on.
Upvotes: 0