Reputation: 4309
An FCM token is in the format of
fKk623mCfkm4:APA21bG2PqYzjxkwVElcODnMdTk9BP1AoNw-
rogg8uUdK9lzIc0Ow0KVa_19PRZE85v_0VZjBPBve4PAwqX9mVstr
o2fnzWoChgOVdIRU0YvOMRniOwg-KGB5EjajqD3Szl2lBwPZQBJ
Which is fairly long.
If one wants to store the token in MySQL and make sure that the token is unique, how would one go about doing that? I reckon the string is a bit too long to index. Is it reasonable to assume that the part before the colon (in this case fKk623mCfkm4
) is unique within the scope of the application? If so this would be a much more reasonable way to index the token. (Just VARCHAR still, but with CREATE UNIQUE INDEX fcm_token ON myTable myColumn(12)
)
I realize that that may be long shot, so: How would one solve this problem?
I am interested in the uniqueness of the token because a device might change owner (or user, for whatever reason) which would require that the token is reassigned in my database - but that's hard (irresponsible) if I cannot find it because it's not indexed.
A string of 12 alphanumeric characters may produce 5.906682292E37 different results. That kind of randomness seems like it would be enough to avoid a collision within a single application. Edit: Bad idea.
I am doing this currently:
CREATE TABLE `devices` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`USER` int(10) unsigned NOT NULL,
`TOKEN` varchar(255) NOT NULL,
`TOKENSHA1` binary(20) DEFAULT NULL,
`MODIFIED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
UNIQUE KEY `TOKENSHA1` (`TOKENSHA1`),
KEY `USER` (`USER`),
KEY `MODIFIED` (`MODIFIED`)
) ENGINE=InnoDB
Upvotes: 4
Views: 4954
Reputation: 142366
Note to other readers: This answer assumes that the only purpose for the table is to catch dups. This assumption turns out not to be correct.
How big is your table? Let's say you have 1 million rows. That index (n its own BTree) would take about 250MB on disk. How much RAM do you have? What is the setting of innodb_buffer_pool_size
? They are probably big enough to afford some effort at caching the 250MB. If you don't need to test more than 100 tokens per second, even if you have to hit disk, it should not be a problem. 100 inserts/sec would add up to 8M rows per day.
It looks like the token is ascii, so be sure to specify the charset, something like
FCM VARCHAR(156) CHARACTER SET ascii NOT NULL
Do not do CREATE UNIQUE INDEX ... myColumn(12)
- that will store the entire 156 characters, but check only 12 for uniqueness. It is a virtually useless feature, and is often the "wrong" thing to do.
Plans A, B, C
If the only purpose is to check whether the 156-char string has already be seen...
Plan A:
CREATE TABLE x (
fcm CHAR(156) CHARACTER SET ascii NOT NULL,
PRIMARY KEY(fcm)
) ENGINE=InnoDB;
0 copies of ID (0 bytes)
0 copy of SHA1 (0)
1 copy of TOKEN (156 bytes, assuming ascii)
Total: 156 bytes (plus overhead)
Plan B:
CREATE TABLE x (
digest BINARY(16) NOT NULL, -- MD5(fcm)
PRIMARY KEY(digest)
) ENGINE=InnoDB;
0 copies of ID (0 bytes)
1 copy of MD5 (16 bytes)
0 copies of TOKEN (0 bytes)
Total: 16 bytes
Either of these Plans has one BTree. Either will very quickly test for a single dup. Either will touch only one block when you INSERT
a new row.
Plan A will take more space than Plan B. Some are concerned about accidental dups in digests (MD5, SHA1, etc), and would shun Plan B, otherwise I would not have included Plan A at all.
Either Plan will eventually slow down -- This will happen when the table grows so big that it cannot be cached in RAM. When the table is 20 times as big as cache, only 1/20th of the lookups will find the record in RAM. Plan A, being bigger will start to slow down sooner.
Plan C: Putting both the 156 and 16 in the table makes for a bigger table. Hence you get to the slowdown sooner (than at least one of A or B), regardless of how it is indexed.
PRIMARY KEY(md5)
0 copies of ID (0 bytes)
1 copy of md5 (16 bytes)
1 copy of TOKEN (156 bytes, assuming ascii)
Total: 172 bytes
After adding more columns and indexes
Plan D (by nickdnk):
PRIMARY KEY (`ID`),
UNIQUE KEY `TOKENSHA1` (`TOKENSHA1`),
KEY `USER` (`USER`),
KEY `MODIFIED` (`MODIFIED`)
4 copies of ID (4*4 bytes)
2 copies of SHA1 (2*20 bytes, assuming BINARY(20), not VARCHAR)
1 copy of TOKEN (156 bytes, assuming ascii)
Total: 212 bytes (plus overhead)
Plan E:
PRIMARY KEY(TOKEN),
INDEX(USER),
INDEX(MODIFIED)
0 copies of ID (0 bytes)
0 copies of SHA1 (0 bytes)
3 copies of TOKEN (3*156 bytes, assuming ascii)
Total: 468 bytes
Plan F:
PRIMARY KEY(TOKENSHA1),
INDEX(USER),
INDEX(MODIFIED)
0 copies of ID (0 bytes)
3 copy of SHA1 (3*20 bytes)
1 copy of TOKEN (156 bytes, assuming ascii)
Total: 216 bytes
So, Your Plan (D) is good, especially if you need more indexes. My Plan F is essentially as good. (The 4 byte loss is compensated for by overhead.)
INSERTs
are slowed down some by having to check two UNIQUE
keys. SELECTs
may encounter performance differences.
Upvotes: 1
Reputation: 1892
That example string is not too long to index in MySql's Innodb engine.
MySQL's innodb engine has a maximum index key length of 767 bytes.
If you want to index something that will enforce uniqueness, why not look for options that make no assumptions about the subsections of the token? Such options would include...
It does not seem wise to make assumptions about the part before the : unless the existence and purpose of this part are documented. Even if it works now you can't be sure it will work every time in the future. There's a good chance that this token is either a base 64 encoded public key, or cryptographic hash, or that it is something like a jwt whose contents may change in the future.
Firebase documentation that I have read presents this token as simply a token, with no explanation of its content. It looks base64 encoded, but I have not seen that documented.
Upvotes: 2