mysqlfirebaseindexingfirebase-cloud-messaging

Reputation: 4309

Firebase token structure and storage

An FCM token is in the format of

fKk623mCfkm4:APA21bG2PqYzjxkwVElcODnMdTk9BP1AoNw-
rogg8uUdK9lzIc0Ow0KVa_19PRZE85v_0VZjBPBve4PAwqX9mVstr
o2fnzWoChgOVdIRU0YvOMRniOwg-KGB5EjajqD3Szl2lBwPZQBJ

Which is fairly long.

If one wants to store the token in MySQL and make sure that the token is unique, how would one go about doing that? I reckon the string is a bit too long to index. Is it reasonable to assume that the part before the colon (in this case fKk623mCfkm4) is unique within the scope of the application? If so this would be a much more reasonable way to index the token. (Just VARCHAR still, but with CREATE UNIQUE INDEX fcm_token ON myTable myColumn(12))

I realize that that may be long shot, so: How would one solve this problem?

I am interested in the uniqueness of the token because a device might change owner (or user, for whatever reason) which would require that the token is reassigned in my database - but that's hard (irresponsible) if I cannot find it because it's not indexed.

~~A string of 12 alphanumeric characters may produce 5.906682292E37 different results. That kind of randomness seems like it would be enough to avoid a collision within a single application.~~ Edit: Bad idea.

I am doing this currently:

CREATE TABLE `devices` (
 `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `USER` int(10) unsigned NOT NULL,
 `TOKEN` varchar(255) NOT NULL,
 `TOKENSHA1` binary(20) DEFAULT NULL,
 `MODIFIED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`ID`),
 UNIQUE KEY `TOKENSHA1` (`TOKENSHA1`),
 KEY `USER` (`USER`),
 KEY `MODIFIED` (`MODIFIED`)
) ENGINE=InnoDB

Upvotes: 4

Answers (2)

Rick James

Reputation: 142366

Note to other readers: This answer assumes that the only purpose for the table is to catch dups. This assumption turns out not to be correct.

How big is your table? Let's say you have 1 million rows. That index (n its own BTree) would take about 250MB on disk. How much RAM do you have? What is the setting of innodb_buffer_pool_size? They are probably big enough to afford some effort at caching the 250MB. If you don't need to test more than 100 tokens per second, even if you have to hit disk, it should not be a problem. 100 inserts/sec would add up to 8M rows per day.

It looks like the token is ascii, so be sure to specify the charset, something like

FCM VARCHAR(156) CHARACTER SET ascii NOT NULL

Do not do CREATE UNIQUE INDEX ... myColumn(12) - that will store the entire 156 characters, but check only 12 for uniqueness. It is a virtually useless feature, and is often the "wrong" thing to do.

Plans A, B, C

If the only purpose is to check whether the 156-char string has already be seen...

Plan A:

CREATE TABLE x (
    fcm CHAR(156) CHARACTER SET ascii NOT NULL,
    PRIMARY KEY(fcm)
    ) ENGINE=InnoDB;

0 copies of ID (0 bytes)
0 copy of SHA1 (0)
1 copy of TOKEN (156 bytes, assuming ascii)

Total: 156 bytes (plus overhead)

Plan B:

CREATE TABLE x (
    digest BINARY(16) NOT NULL, -- MD5(fcm)
    PRIMARY KEY(digest)
    ) ENGINE=InnoDB;

0 copies of ID (0 bytes)
1 copy of MD5 (16 bytes)
0 copies of TOKEN (0 bytes)

Total: 16 bytes

Either of these Plans has one BTree. Either will very quickly test for a single dup. Either will touch only one block when you INSERT a new row.

Plan A will take more space than Plan B. Some are concerned about accidental dups in digests (MD5, SHA1, etc), and would shun Plan B, otherwise I would not have included Plan A at all.

Either Plan will eventually slow down -- This will happen when the table grows so big that it cannot be cached in RAM. When the table is 20 times as big as cache, only 1/20th of the lookups will find the record in RAM. Plan A, being bigger will start to slow down sooner.

Plan C: Putting both the 156 and 16 in the table makes for a bigger table. Hence you get to the slowdown sooner (than at least one of A or B), regardless of how it is indexed.

PRIMARY KEY(md5)

0 copies of ID (0 bytes)
1 copy of md5 (16 bytes)
1 copy of TOKEN (156 bytes, assuming ascii)

Total: 172 bytes

After adding more columns and indexes

Plan D (by nickdnk):

PRIMARY KEY (`ID`),
UNIQUE KEY `TOKENSHA1` (`TOKENSHA1`),
KEY `USER` (`USER`),
KEY `MODIFIED` (`MODIFIED`)

4 copies of ID (4*4 bytes)
2 copies of SHA1 (2*20 bytes, assuming BINARY(20), not VARCHAR)
1 copy of TOKEN (156 bytes, assuming ascii)

Total: 212 bytes (plus overhead)

Plan E:

PRIMARY KEY(TOKEN),
INDEX(USER),
INDEX(MODIFIED)

0 copies of ID (0 bytes)
0 copies of SHA1 (0 bytes)
3 copies of TOKEN (3*156 bytes, assuming ascii)

Total: 468 bytes

Plan F:

PRIMARY KEY(TOKENSHA1),
INDEX(USER),
INDEX(MODIFIED)

0 copies of ID (0 bytes)
3 copy of SHA1 (3*20 bytes)
1 copy of TOKEN (156 bytes, assuming ascii)

Total: 216 bytes

So, Your Plan (D) is good, especially if you need more indexes. My Plan F is essentially as good. (The 4 byte loss is compensated for by overhead.)

INSERTs are slowed down some by having to check two UNIQUE keys. SELECTs may encounter performance differences.

Upvotes: 1

joshp

Reputation: 1892

That example string is not too long to index in MySql's Innodb engine.

MySQL's innodb engine has a maximum index key length of 767 bytes.

MySql Reference

If you want to index something that will enforce uniqueness, why not look for options that make no assumptions about the subsections of the token? Such options would include...

Just index the whole string (looks like 156 characters).
Index an sha1 hash of the string (40 hex digits) from mysql's sha1() function
Index a binary sha1 (20 bytes)
If it is base 64 encoded using a documented encoding, return it to its binary form and index that.

It does not seem wise to make assumptions about the part before the : unless the existence and purpose of this part are documented. Even if it works now you can't be sure it will work every time in the future. There's a good chance that this token is either a base 64 encoded public key, or cryptographic hash, or that it is something like a jwt whose contents may change in the future.

Firebase documentation that I have read presents this token as simply a token, with no explanation of its content. It looks base64 encoded, but I have not seen that documented.

Upvotes: 2

Firebase token structure and storage

Answers (2)

Related Questions