Reputation: 51
I am attempting to rank by % return for each day across 700 symbols.
For Example:
date symbol pct_return
-----------------------------
1100101 IBM 1.2
1100101 AAPL 2.1
1100101 HPQ -0.5
approx 700 more entries like this for date 1100101
1100102 IBM -.02
1100102 AAPL -.6
1100102 HPQ 1.9
approx 700 more entries like this for date 1100102
What I am trying to do is create a query or stored procedure to loop through each day, and then rank and insert the rank value for the percent return for each symbol within each day.
I would like to insert the rank values for both ascending and descending ranking of percent return.
Sample table for just 3 symbols after ranking would look like:
date symbol pct_return rank_asc rank_desc
------------------------------------------------------
1100101 IBM 1.2 2 2
1100101 AAPL 2.1 3 1
1100101 HPQ -0.5 1 3
1100102 IBM -.02 2 2
1100102 AAPL -.6 1 3
1100102 HPQ 1.9 3 1
Upvotes: 2
Views: 1278
Reputation: 656
This is the typical problem of within-group aggregates that is solved with a left self exclusion join.
You don't need any stored procedure to get the results you want, just a simple INSERT INTO ... SELECT ...
query will do the trick.
Here is an example script with the provided data:
CREATE TABLE shuffled_symbols ( dat INT NOT NULL ,symbol VARCHAR(4) NOT NULL ,pct_return DECIMAL(4,2) NOT NULL ,PRIMARY KEY (dat ,symbol) ); CREATE TABLE ranked_symbols ( dat INT NOT NULL ,symbol VARCHAR(4) NOT NULL ,pct_return DECIMAL(4,2) NOT NULL ,rank_asc INT UNSIGNED NOT NULL ,rank_desc INT UNSIGNED NOT NULL ); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100101,'IBM',1.2); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100101,'AAPL',2.1); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100101,'HPQ',-0.5); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100102,'IBM',-0.02); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100102,'AAPL',-0.6); INSERT INTO shuffled_symbols (dat,symbol,pct_return) VALUES (1100102,'HPQ',1.9);
Here is the query to compute ranks (sorry for the bad formatting, I couldn't make it display correctly inside <pre>
tags):
INSERT INTO ranked_symbols (
dat, symbol, pct_return, rank_asc, rank_desc
) SELECT ars.dat, ars.symbol, ars.pct_return, ars.rank_asc, COUNT(ss3.dat)+1 rank_desc
FROM (
SELECT ss1.dat, ss1.symbol, ss1.pct_return, COUNT(ss2.dat)+1 rank_asc
FROM shuffled_symbols ss1
LEFT JOIN shuffled_symbols ss2
ON ss2.dat = ss1.dat
AND ss2.pct_return < ss1.pct_return
GROUP BY ss1.dat, ss1.symbol
) ars
LEFT JOIN shuffled_symbols ss3
ON ss3.dat = ars.dat
AND ss3.pct_return > ars.pct_return
GROUP BY ars.dat, ars.symbol
;
Please note that this query will only return valid ranks if you don't have duplicates of symbols for a given date. This is why I created the shuffled_symbols
table with a PRIMARY KEY (dat ,symbol)
.
In ranked_symbols table you get the following results:
SELECT * FROM ranked_symbols; +---------+--------+------------+----------+-----------+ | dat | symbol | pct_return | rank_asc | rank_desc | +---------+--------+------------+----------+-----------+ | 1100101 | AAPL | 2.10 | 3 | 1 | | 1100101 | HPQ | -0.50 | 1 | 3 | | 1100101 | IBM | 1.20 | 2 | 2 | | 1100102 | AAPL | -0.60 | 1 | 3 | | 1100102 | HPQ | 1.90 | 3 | 1 | | 1100102 | IBM | -0.02 | 2 | 2 | +---------+--------+------------+----------+-----------+ 6 rows in set (0.00 sec)
Upvotes: 0
Reputation: 51
Below is how I expanded on Brent's example with PERL. This was very helpful to me and I greatly appreciate the community support.
To run the code from command line:
rank.pl FromTableNoRank ToTableWithRank pct_return DESC
#!/usr/bin/perl -w
use strict;
use warnings;
use Carp;
// connect to database here
// Not enough command-line arguments, helpful error message.
if(@ARGV!=4) {
die("$0 requires four arguments. 1.FromTable 2.ToTable 3.Order by value(ex.pct_return) 4.ASC (low = 1) or DESC(high = 1)\n");
}
// Use variables for insert to minimize errors
// $OrderBy is the value to rank
// $AscDesc declares which way to rank
my $FromTable = $ARGV[0];
my $ToTable = $ARGV[1];
my $OrderBy = $ARGV[2];
my $AscDesc = $ARGV[3];
// DateTable is table of dates for use within the insert query. Used to loop through and rank individual days.
my $query7 = "SELECT dat FROM DateTable ORDER BY dat ASC";
my $sth7 = $dbh->prepare($query7) || carp DBI::errstr;
$sth7->execute() || carp DBI::errstr;
// Fetchrow_hashref holds all dates from date tables and while loop walks through one at a time
// The insert sorts by a value and then a row number is added to provide a rank of values
// The nested sth exists because need to reference $dateVar from fetchrow
while(my $ref = $sth7->fetchrow_hashref()) {
my $dateVar = $ref->{dat};
print "$dateVar \n";
my $query6 = "INSERT INTO $ToTable
SELECT t.*,". '@rownum := @rownum +1' . "
FROM $FromTable t, ".'(SELECT @rownum := 0) r
WHERE dat ='."$dateVar
ORDER BY $OrderBy $AscDesc";
my $sth6 = $dbh->prepare($query6) || carp DBI::errstr;
$sth6->execute() || carp DBI::errstr;
$sth6->finish();
}
$sth7->finish();
$dbh->disconnect();
Upvotes: 0
Reputation: 4509
You can use this syntax to select the row number in your select:
SELECT @row := @row + 1 as row, t.*
FROM table t, (SELECT @row := 0) r;
You can then select all values with ORDER BY
ascending and descending for each day, and insert them into your table.
Source: http://snippets.dzone.com/posts/show/6831
Example:
INSERT INTO [your table]
SELECT date, symbol, pct_return, @row := @row + 1
FROM [your other table] t, (SELECT @row := 0) r
ORDER BY pct_return ASC;
To get the ascending values, then an update on the same table with a similar query to get the descending values.
Upvotes: 1