Reputation: 2029
We are currently optimizing a MySQL table (InnoDB) that will eventially have more than 100 million rows.
In one column, we are storing IP addresses (VARCHAR 45). We need to put an index on this column, as we have to be able to retrieve all rows per specific IP address.
70% of all rows, however, will not store an IP address (empty).
Our question: Shall we store those empty values as NULL and thus ALLOW NULL on this column (will add 1 byte to each row). Or shall we NOT ALLOW NULL and store those empty values as '' (empty string)?
What is best for performance?
We will never have to search rows that are empty (= '') or null (IS NULL), only search for specific IP addresses (= '123.456.789.123').
Update: There are indeed many questions on SO that address similar scenarios. However, some answers seem to be contradictory or say "it depends". We will run some tests and post our findings for our specific scenario here.
Upvotes: 14
Views: 7703
Reputation: 155
Go with NULL
values. InnoDB has no space cost for NULL
s, and NULL
values are excluded from indexes, so you'll have a faster index lookup for the values which are present.
As far as how you store the IP itself (string verus number), that seems like a far less important point of optimization.
Upvotes: 1
Reputation: 1336
The main difference between NULL and an empty string is related to comparing values. Two empty strings are considered equal. Two NULL values are not. For example, if you want to join two tables based on IP-value columns, the result will be quite different for NULL and empty strings, and most likely you want the behavior of NULL.
If you only are going to search for specific IP-adresses, using NULL or empty string should not matter. If the IP-value column is indexed, the optimizer will obtain an estimate from InnoDB on the number of rows with the specific value. The general statistics on number of rows per value will not be used in this case.
Avoiding NULL values will save you 30 MB on 100 million rows when 70% of the rows are NULL. (For rows where the value is an empty string, you will not save any space since you will need one byte to store the length information instead.) Compared to what you can save by storing IP values as a binary string, this is nothing, and I do not think storage overhead is a valid concern.
Upvotes: 0
Reputation: 142296
VARCHAR(39)
is sufficient for both IPv4 (the old format, for which there are no more values available) and IPv6.
The optimizer may screw up if 70% of the values are the same ('' or NULL). I suggest you have another table with the IP and an ID for JOINing back to your original table. By having no 'empty' IPs in the second table, the optimizer is more likely to "do the right thing".
With that, LEFT JOIN
can be used to see if there is an IP.
IPv6 can be stored in BINARY(16) to save space.
Upvotes: 2