Reputation: 10875

SQL: Data manipulation and indexes

Recently I have been trying to optimize my tables, mainly because I've learned alot more about database design through some courses at my school. I also chose to do this because I've been getting alot of timeouts on some queries, and lately have found out that it was indeed my bad database designing.

So basically, I will be doing SELECT, UPDATE, INSERT and DELETE on this table.

Here is my current database schema:

-- ----------------------------
-- Table structure for `characters_items`
-- ----------------------------
DROP TABLE IF EXISTS `characters_items`;
CREATE TABLE `characters_items` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `master_id` int(10) unsigned NOT NULL DEFAULT '0',
  `item_id` smallint(6) NOT NULL,
  `amount` int(11) NOT NULL,
  `slot_id` smallint(9) NOT NULL DEFAULT '0',
  `type` tinyint(4) NOT NULL DEFAULT '0',
  `extra_data` text,
  PRIMARY KEY (`id`),
  KEY `master_id` (`master_id`),
  CONSTRAINT `characters_items_ibfk_1` FOREIGN KEY (`master_id`) REFERENCES `characters` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=904 DEFAULT CHARSET=latin1;

In my program, I will be manipulating large amounts (up to 500 rows at a time, as you can see this is a table for all character items).

I also learned that indexing values will slow your queries down, if you are doing data manipulation.

Here is some queries that I will be using:

            StringBuilder query = new StringBuilder();

            client.ClearParameters();
            client.AddParameter("master_id", this.owner.MasterId);
            client.AddParameter("type", (byte)CharacterItemType.Bank);
            client.AddParameter("capacity", this.Capacity);

            // Grab the original items.
            DataRow[] data = client.ReadDataTable("SELECT item_id,amount,slot_id FROM characters_items WHERE master_id=@master_id AND type=@type LIMIT @capacity").Select();
            Item[] originalItems = new Item[this.Capacity];
            if (data != null && data.Length > 0)
            {
                for (short i = 0; i < data.Length; i++)
                {
                    DataRow row = data[i];

                    short id = (short)row[0];
                    int count = (int)row[1];
                    short slotId = (short)row[2];

                    originalItems[slotId] = new Item(id, count);
                }
            }

            // Now we compare the items to see if anything has been changed.
            Item[] items = this.ToArray();
            for (short i = 0; i < items.Length; i++)
            {
                Item item = items[i];
                Item original = originalItems[i];

                // item was added.
                if (item != null && original == null)
                {
                    query.Append("INSERT INTO characters_items (master_id,item_id,amount,slot_id,type,extra_data) ");
                    query.Append("VALUES (");
                    query.Append(this.owner.MasterId);
                    query.Append(",");
                    query.Append(item.Id);
                    query.Append(",");
                    query.Append(item.Count);
                    query.Append(",");
                    query.Append(i);
                    query.Append(",");
                    query.Append((byte)CharacterItemType.Bank);

                    string extraData = item.SerializeExtraData();
                    if (extraData != null)
                    {
                        query.Append(",'");
                        query.Append(extraData);
                        query.Append("'");
                    }
                    else
                    {
                        query.Append(",null");
                    }

                    query.Append(");");
                }
                // item was deleted.
                else if (item == null && original != null)
                {
                    query.Append("DELETE FROM characters_items WHERE slot_id=");
                    query.Append(i);
                    query.Append(" AND master_id=");
                    query.Append(this.owner.MasterId);
                    query.Append(" AND type=");
                    query.Append((byte)CharacterItemType.Inventory);
                    query.Append(" LIMIT 1;");
                }
                // item was modified.
                else if (item != null && original != null)
                {
                    if (item.Id != original.Id || item.Count != original.Count)
                    {
                        query.Append("UPDATE characters_items SET item_id=");
                        query.Append(item.Id);
                        query.Append(",amount=");
                        query.Append(item.Count);

                        string extraData = item.SerializeExtraData();
                        if (extraData != null)
                        {
                            query.Append(",extra_data='");
                            query.Append(extraData);
                            query.Append("'");
                        }
                        else
                        {
                            query.Append(",extra_data=null");
                        }

                        query.Append(" WHERE master_id=@master_id AND type=@type AND slot_id=");
                        query.Append(i);
                        query.Append(";");
                    }
                }
            }

            // If a query was actually built, we will execute it.
            if (query.Length > 0)
            {
                client.SetConnectionTimeout(60);
                client.ExecuteUpdate(query.ToString());
                return true;
            }
        }
        catch (Exception ex)
        {
            Program.Logger.PrintException(ex);
        }
        return false;

As you can see, I am almost always referencing the slot_id, type, and master_id fields. I was wondering if I made the slot_id and type fields a indexed field, how would it affect my overall data manipulation performance? Will be be affected in a positive way, or in a negative way?

Please give me some advice (except on the C# code, I will be fixing it up later!)

Upvotes: 0

Answers (3)

spencer7593

Reputation: 108410

For optimum performance of the given UPDATE and DELETE statements, I'd recommend:

ALTER TABLE characters_items
ADD KEY characters_items_IX1 (master_id, item_id, slot_id);

For optimum performance of the SELECT statement, as well as the DML statements, the index can be modified to include two additional columns:

ALTER TABLE characters_items
ADD KEY characters_items_IX1 (master_id, item_id, slot_id, type, amount);

(NOTE: you'd only add one of those indexes, you don't need both.)

We observe that your UPDATE and DELETE statements specify equals predicates on all three columns. In this case, you would want the columns in the index ordered from highest cardinality to lowest. (That is, the column that has most number of distinct values, highest selectivity first, followed by the other columns.)

For large number of rows in the table, such an index will very likely improve the performance of your UPDATE and DELETE operations.

(You aren't likely to see any performance difference, given that the auto_increment value on your table is only 904, which means you probably have less than a thousand rows in the table.)

If master_id is already "nearly unique", then the existing index on that column would be sufficient.

If you add the index I recommended, then the existing index is redundant, and could be dropped. (Any query making use of the existing index will likely make use of the new index, with master_id as the leading column.)

Yes, there are tradeoffs in index design. There is additional work to performed, to maintain the indexes, when DML operations are performed.

You would not want to add indexes on just slot_id or just item_id, if they are not selective, or if you don't have any queries that would make use of them. There's no point in having indexes that aren't used.

As far as other indexes, that would really depend on the other statements you are executing, in particular, the SELECT statements. We'd really want to look at the predicates (WHERE clauses and JOIN conditions), to see if additional indexes might help.

Addendum:

Q: What is the difference between adding keys separately and as a group? (like the example you gave)

In this case, three separate indexes (on master_id, item_id and slot_id) would not be useful, because the execution plan is likely to use only one of them, ideally, the index with the highest selectivity. An execution plans that "combine indexes" can be faster than full table scans, but they rarely outperform a single index that already has all of the columns in it.

The big difference is the "leading" column in the index. If you don't have a predicate (WHERE clause) or an ORDER BY on the leading column of the index, it's unlikely the index will be used.

The best index for your SELECT statement would be a "covering" index, that is, an index that contains ALL of the columns referenced in the query, such that the query can be satisfied from the index, without having to reference pages in the data table.

ADD KEY characters_items_IX1 (master_id, item_id, slot_id, type, amount);

Upvotes: 1

Branko Dimitrijevic

Reputation: 52117

First of all, never construct the SQL text dynamically, when you can use bound parameters instead. Binding parameters protects you from SQL injection and can facilitate better performance by allowing the DBMS to prepare the SQL statement once and reuse it many times.

As for indexes... they are generally a trade-off between finding and modifying data - they speed-up the former¹ and slow-down the latter. However, if data is modified in a way that also incorporates a search², index can actually end-up speeding-up the modification as well.

Indexes should always be tailor-made to the actual queries your application is doing, which in your case includes these:

SELECT ... WHERE master_id=... AND type=...
INSERT ...
DELETE ... WHERE slot_id=... AND master_id=... AND type=...
UPDATE ... WHERE master_id=... AND type=... AND slot_id=...

All 3 WHERE clauses can be efficiently "served" by a single composite index on {master_id, type, slot_id}. Only the INSERT statement (which by its nature doesn't have WHERE) will be hurt by this additional index.

Considerations:

If the selectivity of some fields of the composite index is low, you might consider removing them from the index to keep it smaller and more cache-friendly. For example, if number of rows sharing the same master_id is expected to be low, indexing just on the master_id won't significantly impact the search performance, but will make the index smaller and easier/quicker to maintain.
On the other hand, you might also consider including fields in the index that are not directly used by the WHERE clause, but are listed elsewhere in the query. For example, to cover the SELECT item_id, amount, slot_id, we could add the item_id and amount to the "trailing edge" of the index (slot_id is already in the index).
MySQL/InnoDB always clusters tables, so the price of secondary indexes is relatively high (see the "Disadvantages of clustering" in this article).

As you can see, all this is a pretty elaborate balancing act and even experts can't always predict the optimal balance. So if in doubt, measure before deciding!

For an excellent introduction on the topic of indexing, and database performance in general, I warmly recommend: Use The Index. Luke!

¹ Assuming they are used correctly.

² Typically: DELETE ... WHERE ... and UPDATE ... WHERE ....

Upvotes: 3

dodexahedron

Reputation: 4657

Idexes will speed up your selects if they have high selectivity on the where clauses of select statements.

Check which select statements you are running most often and then index on the fields that you use in the where clause. If you use wildcards (especially things like '%something%'), indexes will be less useful.

I don't remember specifically if MySQL can do included columns for indexes, but if so, you may get additional benefit from adding the columns that are in the select statement but not in the where clause as included columns in the index.

Otherwise, the actual operation that will be performed is an index seek and then a key lookup. An index always has the primary key of the associated data rows as an included column, so once the index key is found, the row is looked up via its primary key. If you can avoid the key lookup, you can reduce the IO cost of your query significantly. This will, of course, increase the cost of inserts and updates, marginally, and will drastically increase the space your database takes up.

Upvotes: 0

SQL: Data manipulation and indexes

Answers (3)

Related Questions