Reputation: 12794
Is there an agregate function that returns any value from a group. I could use MIN
or MAX
, but would rather avoid the overhead if possible given it's a text field.
My situation is an error log summary. The errors are grouped by the type of error and an example of the error text is displayed for each group. It doesn't matter which error message is used as the example.
SELECT
ref_code,
log_type,
error_number,
COUNT(*) AS count,
MIN(data) AS example
FROM data
GROUP BY
ref_code,
log_type,
error_number
What can I replace MIN(data)
with to not have to compare 100,000s of varchar(2000) values?
Upvotes: 8
Views: 1743
Reputation: 9500
Well, since you asked about OVER PARTITION AND ORDER BY, below is a version that does your GROUP BY, but then also uses ROW_NUMBER() with OVER and PARTITION AND ORDER BY, to number the first ref_code, log_type, error_num
combination it comes across as row number 1 (with whatever data column is there at 1). Then it renumbers, starting at 1, at the next distinct ref_code, log_type, error_num
combination it finds (with whatever data column that happens to be there). So you can then simply pull the data field at row number 1 as a representative data field for a given ref_code, log_type, error_num
.
It's still lacking something. It would be more elegant if I didn't have the double pass (once for aggregation and once for row_number()); however, it might perform very well none-the-less. I'll have to think about it some more to see if I can eliminate the double pass.
But it avoids any comparison of the large data field. And it is represents a way to do what you asked: to pull 1 representative sample from the data field in correlation with the aggregated fields.
SELECT
t.ref_code,
t.log_type,
t.error_number,
t.count,
d.data
FROM
(
SELECT
ref_code,
log_type,
error_number,
COUNT(*) as count
FROM data
GROUP BY
ref_code,
log_type,
error_number
) t
INNER JOIN
(
SELECT
ref_code,
log_type,
error_number,
data,
ROW_NUMBER() OVER
(
PARTITION BY
ref_code,
log_type,
error_number
ORDER BY
ref_code,
log_type,
error_number
) as row_number
FROM data
) d on
d.ref_code = t.ref_code and
d.log_type = t.log_type and
d.error_number = t.error_number and
row_number = 1
Final caveat: I don't have Oracle to try this on. But I did put it together from reading Oracle documentation.
I added the below after I thought further how to elminate the GROUP BY, which I only had in there for COUNT(*). Don't know if it's any faster though.
SELECT *
FROM
(
SELECT
ref_code,
log_type,
error_number,
data,
ROW_NUMBER() OVER
(
PARTITION BY
ref_code,
log_type,
error_number
ORDER BY
ref_code,
log_type,
error_number
) as row_number,
COUNT(*) OVER
(
PARTITION BY
ref_code,
log_type,
error_number
ORDER BY
ref_code,
log_type,
error_number
) as count
FROM data
) t
WHERE row_number = 1
Upvotes: 2
Reputation: 12794
Going by the proposed answers, it appears that MIN(data)
(or MAX(data)
) is the fastest way to achieve what I want. I'm trying to over-optimise unnecessarily.
I'll try out any other answers that come up while I have access to this database, but in the mean time, this comes out on top.
Thank you for everyone's effort!
Upvotes: 3
Reputation: 10411
you can use MIN coupled with KEEP, like this:
MIN(data) keep (dense_rank first order by rowid) AS EXAMPLE
The idea behind this is that the database engine will be sorting data over ROWID instead of the VARCHAR(2000) values, which theoretically should be faster. You can replace ROWID with the primary key value, and check if it's faster
Upvotes: 4