1110
1110

Reputation: 6839

How to test performance of database

I created a database that have following table for user activities:

user_id        |  INTEGER  |  user being notified
actor_id       |  INTEGER  |  user performing the action
activity_type  |  STRING   |  classname/type of the object being notified
activity_id    |  INTEGER  |  id of the object being notified
context_type   |  STRING   |  classname/type of the object's parent
context_id     |  INTEGER  |  id of the object's parent
read/view_at   |  DATETIME |  timestamp of when user saw it

I don't know if this design is good in terms of performances. So I want to test it somehow. I can populate database with a lot of data and open page where I need to get data from this table but I don't know if this is relevant test.

Is there a good way to see how this design will perform when there are 10.000 users and 50.000.000 records in feed table?

I don't know if this can help but this I am using MS SQL Server

Upvotes: 3

Views: 20312

Answers (3)

Neville Kuyt
Neville Kuyt

Reputation: 29649

If performance is critical, it's something you need to build into your development pipeline.

What I've done on performance-critical projects:

  • Create a performance test data set; in your case, I'd load a table with 100M test records, making the best possible guess at the distribution of the data.
  • Write the queries the application will be using to query that data
  • Write a test script which can execute those queries; I like Gatling.
  • Make a guess on the relationship of my test hardware's relationship to the production kit. Ideally, it's identical - but in practice, that may be too expensive. You have to improve that guess based on what you see when you can observe production behaviour.
  • Make a guess on traffic patterns - 10K users isn't that much if they come in a nice, evenly distributed pattern (10K / 24 hours / 60 minutes / 60 seconds = less than 1 request per second); if they all turn up at exactly the same time, you have a problem. I typically have 3 load patterns:
    • gradual ramp up (this is "best case"),
    • aggressive ramp-up (this reflects what you'd expect on busy time periods, e.g. if your users tend to visit within a 1 hour period),
    • and worst-case (reflecting what happens if there's a viral video spike or similar)
  • Schedule my gatling tests to run at least once every day, ideally as part of each check-in
  • Set targets for performance, e.g. "90% of queries in 500ms or less"; when the tests exceed these targets, dedicate time and attention to improving
  • Once you have access to live infrastructure, run tests
  • once you have visitors, monitor query performance, adjust your load tests and targets

Upvotes: 2

Pantelis Natsiavas
Pantelis Natsiavas

Reputation: 5369

Regarding performance, no assumptions are safe. You should follow these steps:

  • Define performance goals, regarding load and response time. You should identify the accepted response times for specific data load.

  • Define resources. Not every machine configuration will behave the same. You should build a Virtual Machine with the target configuration.

  • Execute load testing. You could do it by using tools like SQL Server load generator.

Hope this helps!

Upvotes: 0

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52157

This has worked for me so far:

  1. Identify the set if queries that are performance critical.1
  2. Design the database so the DBMS can use fast execution plans for the queries identified above.2
  3. Fill the database with representative amounts of test data and make sure the query execution plans anticipated above are really being used (and the performance of each individual query is satisfactory).
  4. Benchmark the whole "vertical stack" of your software. By now, there shouldn't be too many bad surprises.

As you can see, the database design is not just about fulfilling business requirements. Indeed, understanding how clients intend to access the data is integral part of the design process.

In other words, what is traditionally though of as "logical design" is not enough - there needs to be a "two-way street" between logical and physical design.

Is there a good way to see how this design will perform when there are 10.000 users and 50.000.000 records in feed table?

That sentence doesn't contain enough information to be answered.

First tell use how you intend to access the data, then perhaps we can help you (re)design the database to support that particular access pattern efficiently.


1 For example, if "loading a page" is critical for the user experience (which it usually is), make a list of queries that will typically be executed during that process.

2 Which includes "physical" techniques such as indexing, clustering, partitioning etc., but can also include some aspects that are traditionally though of as "logical" design, such as the design of keys and relationships. If you understand the underlying data structures that the typical DBMS uses, you'll be able to predict the performance consequences of most of your design decisions. Use The Index, Luke! is an excellent introduction on the topic.

Upvotes: 1

Related Questions