Reputation: 149
My task involves reading from a constantly updated MySQL table created with:
CREATE TABLE mailing (
addr VARCHAR(255) NOT NULL
);
(no date_created fields or anything, kind of arbitrary, I know, but this is what I am given)
This table is supposed to hold millions of email addresses, and more will be added daily.
I am supposed to count the daily growth from the last 30 days of all the domains in this table in a separate one, which I create with:
CREATE TABLE domain_count (
domain VARCHAR(255) NOT NULL PRIMARY KEY,
total_count INT(11) NOT NULL DEFAULT 0,
count_1 INT(11) NOT NULL DEFAULT 0,
count_2 INT(11) NOT NULL DEFAULT 0,
...
count_30 INT(11) NOT NULL DEFAULT 0
);
If say, on day 1 mailing table contains 10 addresses from a single domain, and on day 2 it contains 15 addresses, then I'd like for domain_count.count_1 = 10 and domain_count.count_2 = 5
To that end, I need to keep track of 2 things: one is the total number of addresses in the mailing table as of yesterday (so for day 2 I should have "10" stored somewhere, and for day 3, I should have "15" stored somewhere; with this I can resume searching from the 11th/16th column of the mailing table). The other is the latest count_# value since I need to know to insert into count_2 on day 2 (and loop around back to count_1 on day 31).
I can store these values in a separate file easily, but it feels really messy doing something like that. Is there a MySQL query that would let me associate those values with the tables themselves?
Upvotes: 2
Views: 122
Reputation: 42458
To answer your specific question: such aggregation and denormalisation of your data for reporting is something that would typically be covered by a data warehouse rather than by the DBMS.
To help with your problem, a neater structure might be the following:
CREATE TABLE `domain_count` (
`domain` VARCHAR(255) NOT NULL PRIMARY KEY,
`date` DATE NOT NULL,
`count` INT(11) NOT NULL DEFAULT 0
);
It's still separating an aggregation of the data from the data itself, but you might find the structure easier to query for your requirements. You can use date range / BETWEEN operators to get the ranges that you're interested in, and SUM the counts or use GROUP BY to group by domain, month etc.
Upvotes: 1
Reputation: 562310
There's no standard for what you describe, no.
There's INFORMATION_SCHEMA, which is part of the SQL standard. The TABLES table has a column TABLE_ROWS
, but this only shows the current number of rows.
To do what you describe, I'd make another conventional table based on the TABLES table, add a TIMESTAMP column, and periodically copy rows from the I_S table to my own.
CREATE TABLE mydatabase.TABLES like INFORMATION_SCHEMA.TABLES;
ALTER TABLE mydatabase.TABLES ADD COLUMN updated_at TIMESTAMP;
/* once per day do the following: */
INSERT INTO mydatabase.TABLES
SELECT *, NOW() FROM INFORMATION_SCHEMA.TABLES
WHERE (table_schema, table_name) = ('mydatabase', 'mytable');
Upvotes: 1