Reputation: 199
I have a table looking like this:
mysql> explain test_table;
+---------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+-------------+------+-----+---------+-------+
| timestamp | datetime | NO | PRI | NULL | |
| id | varchar(64) | NO | PRI | NULL | |
| px_last | float | NO | | NULL | |
| twap | float | YES | | NULL | |
+---------------------+-------------+------+-----+---------+-------+
For a given symbol and timestamp, I want to shift the value of the field twap
back one timestep.
This table:
mysql> select * from test_table;
+---------------------+-------+---------+-------+
| timestamp | id | px_last | twap |
+---------------------+-------+---------+-------+
| 2011-01-01 00:00:00 | apple | 101 | 101.1 |
| 2011-01-01 00:00:00 | pear | 50 | 50.1 |
| 2011-01-02 00:00:00 | apple | 102 | 112.8 |
| 2011-01-02 00:00:00 | pear | 51 | 57.3 |
| 2011-01-03 00:00:00 | pear | 52 | 59.1 |
| 2011-01-03 00:00:00 | apple | 103 | 104.1 |
+---------------------+-------+---------+-------+
should end up like this:
mysql> select * from test_table;
+---------------------+-------+---------+-------+
| timestamp | id | px_last | twap |
+---------------------+-------+---------+-------+
| 2011-01-01 00:00:00 | apple | 101 | 112.8 |
| 2011-01-01 00:00:00 | pear | 50 | 57.3 |
| 2011-01-02 00:00:00 | apple | 102 | 104.1 |
| 2011-01-02 00:00:00 | pear | 51 | 59.1 |
| 2011-01-03 00:00:00 | pear | 52 | NULL |
| 2011-01-03 00:00:00 | apple | 103 | NULL |
+---------------------+-------+---------+-------+
My first approach (using this example: How to number rows...) would be to create a row number (1, 2....n) within each symbol, then copy row number, symbol and twap to a temporary table, decrease row number with 1 at each row in the temporary table and then copy the data back to the original table.
Is there a way to do this without creating the temporary table? The table is huge, > 50 million rows and growing, so finding the largest timestamp that is smaller than a given timestamp is not efficient enough.
Upvotes: 0
Views: 265
Reputation: 183466
You can write:
UPDATE test_table tt1
SET tt1.twap =
( SELECT tt2.twap
FROM ( SELECT timestamp,
id,
twap
FROM test_table
) tt2
WHERE tt2.timestamp = DATE_ADD(tt1.timestamp, INTERVAL 1 DAY)
AND tt2.id = tt1.id
)
;
Technically speaking, this does create a temporary table (as a workaround for the dreaded ERROR 1093 (HY000): You can't specify target table 'tt1' for update in FROM clause
), but at least it's all handled implicitly by that inner subquery, rather than requiring you to create one as a separate step.
For a large table, I would not expect this to perform very well, but you can break it into smaller updates by handling just one time-range at a time (provided you start with the earliest time-range and move futureward, so that you never overwrite data you're about to use). Your individual statements would then look like this:
UPDATE test_table tt1
SET tt1.twap =
( SELECT tt2.twap
FROM ( SELECT timestamp,
id,
twap
FROM test_table
WHERE timestamp BETWEEN TIMESTAMP '2011-01-02 00:00:00'
AND TIMESTAMP '2011-02-01 23:59:59'
) tt2
WHERE tt2.timestamp = DATE_ADD(tt1.timestamp, INTERVAL 1 DAY)
AND tt2.id = tt1.id
)
WHERE tt1.timestamp BETWEEN TIMESTAMP '2011-01-01 00:00:00'
AND TIMESTAMP '2011-01-31 23:59:59'
;
Upvotes: 1