PunjCoder
PunjCoder

Reputation: 470

Choosing a string primary key vs using a join table with corresponding integer id

I have a large sensor data count table say SENSORS_COUNT with a string SID referring to another table SENSOR_DEFINITIONS with the same primary key SID. As there are millions of data points, the index on string primary key becomes 1) bloated 2) slow. The total number of sensors is pretty small (< 2000).

I can think about 3 different ways of making the queries faster:

  1. Using a join table which translates the string key into a corresponding integer key and refer that with joins in all queries
  2. Load the string/integer translation as a hash in program memory and refer that within the code
  3. Use index on string primary id (which would be slower than integer though)

I'm trying to build a system for a variety of sensors which may have different types of string ids (but same schema). What would be the best recommendation to go about it?

EDIT 1: This is the schema. And yes (thanks for the correction), in SENSORS_COUNT table, SID is not a primary key

TABLE: SENSOR_DEFINITIONS (2000 records)
SID : VARCHAR(20), PRIMARY KEY
SNAME: VARCHAR(50)

TABLE: SENSORS_COUNT (N million records)
SID: VARCHAR(20)
DATETIME: TIMESTAMP
VALUE: INTEGER

Upvotes: 1

Views: 1238

Answers (1)

Rick James
Rick James

Reputation: 142268

For "large" tables, normalization becomes more important. Especially when the table is too big to be cached.

So, I agree with the choice of using a SMALLINT UNSIGNED (2 bytes, 0..64K) for the 2000 sensor names, not a VARCHAR(...).

Without seeing (1) the SHOW CREATE TABLE and (2) some critical SELECTs, it is hard to give further advice.

Probably, a "composite" PRIMARY KEY would be better than an AUTO_INCREMENT. It might be (sensor_id, datetime), but it would help to see the selects first.

Do not have two tables with the same schema (without a good reason).

Upvotes: 1

Related Questions