Reputation: 41

Need best approach for database architecture madness

We are building a scheduling system for our startup.

It's just an ordinary one, except for a "auto-find" feature we are willing to implement. LAMP architecture. Nothing special.

This is how DB looks. Three main tables:

Offices (id, start_time, stop_time)
People (id, office_id, start_time, stop_time)
Schedule (id, people_id, start_time, stop_time)

start_time/stop_time are TIMESTAMPS.

Tables don't need to be this way. This is just what we currently have.

Offices table has open/close time for an office. This table could be as large as 365 days per office, as open/close time is not the same everyday. Note that it could be up to 1000 offices. This makes roughly 365,000+ records in table.

People have join/leave times. It's obviously more restrictive than office. Again, every single day of the year people can have diferent visiting hours. Each office has around 50 people. This makes 1000 offices * 365 days * 50 employees = 18,250,000 records.

Schedule is who will meet who. Each person might have up to 10 meeting/day. Yes, at this point this could easily make 182,5 million rows in this table.

Nothing strange apart from big numbers. What the application needs to do is: given an office, person to meet and a duration, show first 5 dates available.

From what we believe, this app will totally KILL our server. We are just desperate to make this run. First thing we thought was "this is not possible at all". But hey! everything is possible in software, isn't it?

PS: If someone thinks of a better approach that makes the app viable, we would REALLY appreciate it.

Thanks a lot for reading. Hope some hardcore programmer could lend us a hand.

UPDATE:

For testing purposes we've created two exactly same tables:

meetings & offices (id, profesional, start, stop).

ID is primary, the rest is BTREE index. The SQL is something like this (which doesn't work 100%):

SELECT a.profesional, a.stop AS desde, Min(b.start) AS hasta 
FROM meetings AS a 
  JOIN meetings AS b 
    ON  a.profesional=b.profesional 
    AND a.stop < b.start 
WHERE a.profesional = 1 
  AND b.profesional = 1 
GROUP BY a.start 

UNION 

SELECT m.profesional, MIN(m.start), MIN(j.start) 
FROM offices m 
  JOIN meetings j 
    ON  j.profesional = m.profesional 
WHERE j.profesional = 1 
  AND m.profesional = 1 

UNION 

SELECT m.profesional, MAX(j.stop), MAX(m.stop) 
FROM offices m 
  JOIN meetings j 
    ON  j.profesional = m.profesional 
WHERE j.profesional = 1 
  AND m.profesional = 1 

ORDER BY desde ASC

What we've done is the following. Add just 1 office with 240 days. Each day has 8 meetings which makes a total of about 2000 rows. It takes 2.6 (!) seconds to perform such query. Is the query wrong? Can it be re-written?

Upvotes: 4

Answers (2)

ypercubeᵀᴹ

Reputation: 115550

Your application seems to need one crucial query. Find the intervals defined by

 (OfficeOpenIntervals INTERSECT PeopleAtOfficeIntervals) MINUS ScheduleIntervals

and search in these intervals, near or after a certain date.

Using appropriate indices and restricting the query (searching for one person only, for the next 60 days, etc) will probably be fine. Handling time intervals operations is tricky but you can test with various indices and ways to write the queries.

Another option (if you test and find no efficient way through indexing) is to have a separate AvailableSlots table which would be at first, when there is no scheduled appointments, populated with all available days a person is in the office (that would be the OfficeOpenIntervals INTERSECT PeopleAtOfficeIntervals). Then every time a appointment is added in Schedule, the corresponding row in this AvailableSlots table would be either deleted, updated or split in two rows that would store the remaining available intervals for the person who was scheduled for meeting.

So, the query to show first 5 dates available would only have to search in this table only.

This is not a normalized solution and the integrity has to maintained by stored procedures (for all operations like a schedule added, a person leaving an office, starting, etc). The initial population could also take time - and space - but you don't have to populate the table for a hundred years. It could be for just a few months and additional populations can be done later (once per month or per year or when needed).

Upvotes: 0

Matti Virkkunen

Reputation: 65126

If you're given a person, doesn't that already reduce the number of schedule rows to consider down by a factor of 50000? The number of office rows will also boil down to a couple of hundred if you only consider the given office. A proper index will find you those rows in no time.

Also, do people really schedule a full year of meetings in advance, or is it more likely you will only have a fully booked database for a month or two into the future? You can always move old data into an archive if you start having performance problems with your main database.

Also, with "up to" estimates it's easy to think too big. You should rather try to figure out how many people each office will have on average and how many meetings they will have a day on average. "Up to 10 meetings a day" might easily turn into "usually two a day". Depends on what kind of business we're talking about, of course.

And don't forget to subtract weekends. They make up 2/7 of the year.

Upvotes: 6

Need best approach for database architecture madness

Answers (2)

Related Questions