Venu
Venu

Reputation: 3

SQL Server : Finding medians of multiple columns in the same table

I have a table with the following structure.

create table FindMedians 
( GroupByColumn varchar(100) 
, TimeInterval_1 int 
, TimeInterval_2 int 
, TimeInterval_3 int 
);

I need to find the medians of each of the time intervals for each group. I've been calculating the medians for each column separately and UNIONing them, and then PIVOTing to get a final result as:

GroupByColumn     Median1     Median2     Median3

using a query given in solution to Function to Calculate Median in Sql Server

Note: I'm just using the query, I haven't created a function.

The original table has close 500K rows, and trying to calculate the medians separately for each column is slow. Is there a well performing way that would give me the medians of all the columns in a single query, without having to calculate separately for each column?

Thanks

Upvotes: 0

Views: 1144

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271013

DOn't worry about performance. The ranking functions should be sorting only the columns that are needed, not the entire data set. The performance is going to be affected primarily by how much of the data fits in memory. Since the original data fits in memory, and the ranking for each column should be an in-memory sort, performance should be pretty good.

For the record, there are O(n) algorithms for finding a median. I am not aware of any databases that implements them, though.

Upvotes: 0

podiluska
podiluska

Reputation: 51514

As calculating a median requires sorting the data, unless your data is in order, then there isn't really any alternative to sorting and calculating the medians independently.

Upvotes: 1

Related Questions