Reputation: 1669
I have a HISTORY
table with 9 million records. I need to find year-wise, month-wise records created. I was using query no 1, However it timed out several times.
SELECT
year(created) as year,
MONTHNAME(created) as month,
count(*) as ymcount
FROM
HISTORY
GROUP BY
year(created), MONTHNAME(created);
I decided to add where year(created)
, this time the query took 30 mins (yes it takes so long) to execute.
SELECT
year(created) as year,
MONTHNAME(created) as month,
count(*) as ymcount
FROM
HISTORY
WHERE
year(created) = 2010
GROUP BY
year(created), MONTHNAME(created) ;
I was planning to add an index on created
timestamp column, however before doing so, I need the opinion (since its going to take a long time to index such a huge table).
Will adding an index on created(timestamp)
column improve performance, considering year function is used on the column?
Upvotes: 3
Views: 10765
Reputation: 1103
As been stated before, in your case, an index won't be used because the index is created on the column 'created' and you are querying on 'year(created)'.
What you can do is add two generated columns year_gen = year(create) and month_gen = MONTHNAME(created) to your table and index these two columns. The DB2 Query Optimizer will automatically use these two generated columns and it will also use the indices created on these columns.
The code should be something like (but not 100% sure since I have no DB2 to test)
SET INTEGRITY FOR HISTORY OFF CASCADE DEFERRED @
ALTER TABLE HISTORY ADD COLUMN YEAR_GEN SMALLINT GENERATED ALWAYS AS (YEAR(CREATE)),
ADD COLUMN MONTH_GEN VARCHAR(20) GENERATED ALWAYS AS (YEAR(CREATE)) @
SET INTEGRITY FOR HISTORY IMMEDIATE CHECKED FORCE GENERATED @
CREATE INDEX HISTORY_YEAR_IDX ON HISTORY YEAR_GEN ASC CLUSTER @
CREATE INDEX HISTORY_MONTH_IDX ON HISTORY YEAR_GEN ASC @
Just a sidenote: the set integrity off
is mandatory to add generated columns. Your table is inaccessible untill you reset the integrity to checked
and you force the re-calculation of the generated columns (this might take a while in your case).
Setting integrity off without cascade deferred
will set every table with a foreign key to the HISTORY table to OFF too. You will have to manually reset the integrity of these tables too. If I remember correctly, using cascade deferred
in combination with incomming foreign keys may cause DB2 to set the integrity of your table to 'checked by user'.
Upvotes: 0
Reputation: 13056
You want what's known as a Calendar Table (the particular example uses SQL Server, but the solution should be adaptable). Then, you want lots of indices on it (since writes are few, and this is a primary dimension table for analysis).
Assuming you have a minimum Calendar Table that looks like this:
CREATE TABLE Calendar (isoDate DATE,
dayOfMonth INTEGER,
month INTEGER,
year INTEGER);
... with an index over [dayOfMonth
, month
, year
, isoDate
], your query can be re-written like this:
SELECT Calendar.year, Calendar.month,
COUNT(*) AS ymCount
FROM Calendar
JOIN History
ON History.created >= Calendar.isoDate
AND History.created < Calendar.isoDate + 1 MONTH
WHERE Calendar.dayOfMonth = 1
GROUP BY Calendar.year, Calendar.month
The WHERE Calendar.dayOfMonth = 1
is automatically limiting results to 12-per-year. The start of the range is trivially located with the index (given the SARGable data), and the end of the range as well (yes, doing math on a column generally disqualifies indices... on the side the math is used. If the optimizer is at all smart it's going to going to gen a virtual intermediate table containing the start/end of range).
So, index-based (and likely index-only) access for the query. Learn to love indexed dimension tables, that can be used for range queries (Calendar Tables being one of the most useful).
Upvotes: 1
Reputation: 74028
An index will be used, when it helps narrow down the number of rows read.
It will also be used, when it avoids reading the table at all. This is the case, when the index contains all the columns referenced in the query.
In your case the only column referenced is created
, so adding an index on this column should help reducing the necessary reads and improve the overall runtime of your query. However, if created
is the only column in the table, the index won't change anything in the first query, because it doesn't reduce the number of pages to be read.
Even with a large table, you can test, if an index makes a difference. You can copy only part of the rows to a new table and compare the execution plans on the new table with and without an index, e.g.
insert into testhistory
select *
from history
fetch first 100000 rows only
Upvotes: 1
Reputation: 5636
An index won't really help because you have formed the query such that it must perform a complete table scan, index or no index. You have to form the where
clause so it is in the form:
where field op constant
where field
is, of course, your field; op
is = <= => <> between in
, etc. and constant is either a direct constant, 42
, or an operation that can be executed once and the result cached, getdate()
.
Like this:
where created >= DateFromParts( @year, 1, 1 )
and created < DateFromParts( @year + 1, 1, 1 )
The DateFromParts
function will generate a value which remains in effect for the duration of the query. If created
is indexed, now the optimizer will be able to seek to exactly where the correct dates start and tell when the last date in the range has been processed and it can stop. You can keep year(created)
everywhere else -- just get rid of it from the where
clause.
This is called sargability and you can google all kinds of good information on it.
P.S. This is in Sql Server format but you should be able to calculate "beginning of specified year" and "beginning of year after specified year" in whatever DBMS you're using.
Upvotes: 4
Reputation: 2975
I'll assume you are using SQL Server based on your tags.
Yes, the index will make your query faster.
I recommend only using the 'created' column as a key for the index and to not include any additional columns from the History table because they will be unused and only result in more reads than what is necessary.
And of course, be mindful when you create indexes on tables that have a lot of INSERT, UPDATE, DELETE activity as your new index will make these actions more expensive when being performed on the table.
Upvotes: 0