Cameron R
Cameron R

Reputation: 149

Is there a standard for keeping extra info about a MySQL table?

My task involves reading from a constantly updated MySQL table created with:

CREATE TABLE mailing (
    addr VARCHAR(255) NOT NULL
);

(no date_created fields or anything, kind of arbitrary, I know, but this is what I am given)

This table is supposed to hold millions of email addresses, and more will be added daily.

I am supposed to count the daily growth from the last 30 days of all the domains in this table in a separate one, which I create with:

CREATE TABLE domain_count (
    domain VARCHAR(255) NOT NULL PRIMARY KEY,
    total_count INT(11) NOT NULL DEFAULT 0,
    count_1 INT(11) NOT NULL DEFAULT 0,
    count_2 INT(11) NOT NULL DEFAULT 0,
    ...
    count_30 INT(11) NOT NULL DEFAULT 0
);

If say, on day 1 mailing table contains 10 addresses from a single domain, and on day 2 it contains 15 addresses, then I'd like for domain_count.count_1 = 10 and domain_count.count_2 = 5

To that end, I need to keep track of 2 things: one is the total number of addresses in the mailing table as of yesterday (so for day 2 I should have "10" stored somewhere, and for day 3, I should have "15" stored somewhere; with this I can resume searching from the 11th/16th column of the mailing table). The other is the latest count_# value since I need to know to insert into count_2 on day 2 (and loop around back to count_1 on day 31).

I can store these values in a separate file easily, but it feels really messy doing something like that. Is there a MySQL query that would let me associate those values with the tables themselves?

Upvotes: 2

Views: 122

Answers (2)

cmbuckley
cmbuckley

Reputation: 42458

To answer your specific question: such aggregation and denormalisation of your data for reporting is something that would typically be covered by a data warehouse rather than by the DBMS.

To help with your problem, a neater structure might be the following:

CREATE TABLE `domain_count` (
    `domain` VARCHAR(255) NOT NULL PRIMARY KEY,
    `date` DATE NOT NULL,
    `count` INT(11) NOT NULL DEFAULT 0
);

It's still separating an aggregation of the data from the data itself, but you might find the structure easier to query for your requirements. You can use date range / BETWEEN operators to get the ranges that you're interested in, and SUM the counts or use GROUP BY to group by domain, month etc.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562310

There's no standard for what you describe, no.

There's INFORMATION_SCHEMA, which is part of the SQL standard. The TABLES table has a column TABLE_ROWS, but this only shows the current number of rows.

To do what you describe, I'd make another conventional table based on the TABLES table, add a TIMESTAMP column, and periodically copy rows from the I_S table to my own.

CREATE TABLE mydatabase.TABLES like INFORMATION_SCHEMA.TABLES;

ALTER TABLE mydatabase.TABLES ADD COLUMN updated_at TIMESTAMP;

/* once per day do the following: */    
INSERT INTO mydatabase.TABLES 
 SELECT *, NOW() FROM INFORMATION_SCHEMA.TABLES
 WHERE (table_schema, table_name) = ('mydatabase', 'mytable');

Upvotes: 1

Related Questions