Reputation: 472
For a simple web application the main requirement is to process around 30 (10m * 3 tables) million records as fast as possible. I haven't worked with such amount of data before so would like some suggestions/advise from experienced people.
The database will be holding details of businesses. Around 25 attributes will describe a single business; name, address etc. Table structure is as follows.
CREATE TABLE IF NOT EXISTS `businesses` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`type` int(2) NOT NULL,
`organisation` varchar(40) NOT NULL,
`title` varchar(12) NOT NULL,
`given_name` varchar(40) NOT NULL,
`other_name` varchar(40) NOT NULL,
`family_name` varchar(40) NOT NULL,
`suffix` varchar(5) NOT NULL,
`reg_date` date NOT NULL,
`main_trade_name` varchar(150) NOT NULL,
`son_address_l1` varchar(50) NOT NULL,
`son_address_l2` varchar(50) NOT NULL,
`son_address_suburb` int(3) NOT NULL,
`son_address_state` int(2) NOT NULL,
`son_address_postcode` varchar(10) NOT NULL,
`son_address_country` int(3) NOT NULL,
`bus_address_l1` varchar(50) NOT NULL,
`bus_address_l2` varchar(50) NOT NULL,
`bus_address_suburb` int(3) NOT NULL,
`bus_address_state` int(2) NOT NULL,
`bus_address_postcode` varchar(10) NOT NULL,
`bus_address_country` int(3) NOT NULL,
`email` varchar(165) DEFAULT NULL,
`phone` varchar(12) NOT NULL,
`website` varchar(80) NOT NULL,
`employee_size` int(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `phone` (`phone`),
KEY `reg_date` (`reg_date`),
KEY `son_address_state` (`son_address_state`),
KEY `bus_address_state` (`bus_address_state`),
KEY `son_address_country` (`son_address_country`),
KEY `bus_address_country` (`bus_address_country`),
FULLTEXT KEY `title` (`title`),
FULLTEXT KEY `son_address_l1` (`son_address_l1`),
FULLTEXT KEY `son_address_l2` (`son_address_l2`),
FULLTEXT KEY `bus_address_l1` (`bus_address_l1`),
FULLTEXT KEY `bus_address_l2` (`bus_address_l2`)
) ENGINE=MyISAM;
There going to be 2 other tables like this, reason being each business details will be presented in 3 sources (for comparison purposes). Only one table is going to have writes.
About the app usage,
My Questions are,
Thank You.
Upvotes: 4
Views: 2498
Reputation: 65496
I cannot answer to your direct question, but I have experience of working with large datasets.
First thing I would work out is what the majority use case (in your case search) operations woud be, and then consider data storage/partitioning based on the that.
Next thing is measure, measure, and measure again. Some database systems will work well with one kind of operation and others with others. As the amount of data increases and operational complexity increases, things that worked well may start to degrade. This is why you measure - don't try to design this without good evidence of how the db systems you're using work under these loads.
And then work iteratively to the add more operations.
Don't try to deisgn a best fit for all. As your design and research is distilled youll see places where optimisations may be needed or availble. You may also find as we've done in in the past, that different type of caching and indexing may beeded at different times.
Good luck - sounds like an interesting project.
Upvotes: 5