A more effective way to structure database for pivots

Question

To describe the question as good as possible, I'll detail the actual scenario: The site will allow users to add IP's for monitoring them against RBL's. In doing so, I've been pondering different approaches to structuring the database in a manner that is as optimal as possible in terms of efficiency for large IP blocks, whilst still being workable.

The project is built on Laravel, and I've set up a database structure containing:

table_a

Contains information about the IP we monitor.

id (auto inc, primary)

name (varchar, 128) Friendly name for the monitor

ip (varchar, 16) The IP to monitor

email (varchar, 128) An e-mail for notifications

notifications (tinyint, 1) A toggle for notifications

timestamps

-

table_b

Contains information about the RBL's we monitor against.

id (auto inc, primary)

url (varchar, 255) The URL for the monitor

active (tinyint, 1) Toggle for whether we actively check toward it or not

-

table_a_b

A pivot table to maintain the status of each RBL in.

table_a_id (int, 10) Foreign key to id on table_a

table_b_id (int, 10) Foreign key to id on table_b

listed (tinyint, 1) Whether or not the IP is listed on this RBL

notified (tinyint, 1) Whether or not we've already notified the user

So the way it currently works is that when adding an IP, it adds the IP into table_a, as well as (number of rows in table_b * number of IP's) into the pivot table. Whilst adding it doesn't take that long - the issue I see is that adding a /24 of IP's (256 IP's) tracked against (currently) 87 RBL's creates a grand total of 22,272 records. That's for a /24. A /22 (1024 IP's) would be 89,088 records. That's a pretty large amount for a single user, and I can see how that'd quickly destroy database performance.

The alternative I've thought of is to keep a column named listed_on on table_a which is a list of any RBL they are listed in. The row would contain something along the line of 55|32|11 - which would be simple enough to parse in PHP. Still, with a large number of users, I can see that hitting PHP performance instead with a large amount of string handling.

Am I missing an obvious solution here, or are these two ones (perhaps especially the latter) the best bet?

Cheers!

philipxy · Accepted Answer

The straightforward relational design (instead of table_a_b(ip, rbl, ...)) is tables listed(ip, rbl) "ip IP is listed on rbl RBL" and notified(ip, rbl) "user has been notifed that ip IP is listed on rbl RBL". Relational tables are intended to hold the rows that make a true proposition (statement) from some table-specific predicate (sentence template parameterized by columns). Tables that should contain a row for every possible value for a column or columns are rare.

Whether instead a design should be used that combines these tables depends on your predicates and the constraints that follow from applying them to all the situations that can arise. Eg if notifications only ever happen to listed IP-RBL pairs then maybe best is ip_rbl(ip, rbl, notified) "ip IP is in rbl RBL and NOTIFIED is whether user has been notified". (Here the tradeoff is more smaller tables & more joins vs fewer larger tables & more searching.)

PS Pivot tables are not generally best for manipulating & querying data compared to the unpivoted data. They can be appropriate for formatting final output to humans or for moving between data and metadata. See this news post. (I saw your pivot table as unpivoted since the boolean column names aren't data values even though it is a table that is total on IP-RBL.)

A more effective way to structure database for pivots

Answers (1)

Related Questions