neilrudds
neilrudds

Reputation: 215

How to store JSON data with dynamic keys in SQL Server

I currently have several IoT devices connected to Azure IoT Hub publishing information and storing it in Azure SQL Server.

I would like to expand the capabilities of these devices to allow for the storage of some dynamic JSON data and understand the best way to structure this information.

The 2 examples show information being sent from 2 different IoT assets, the only keys that are guaranteed to be common to both are the TimeStamp, AssetId, RPM, Pwr and Runhrs. All other keys will be dynamic.

I have considered using tagging as described here: Recommended SQL database design for tags or tagging

However, I do not know how this solution would work for multiple tags against one record.

{
   "TimeStamp":"2019-04-23T18:25:43.511Z",
   "AssetId":"25896321A",
   "RPM":1000,
   "Pwr":100,
   "PF":1.00,
   "Gfrq":50.0,
   "Vg1":11000,
   "Vg2":10987,
   "Vg3":10785,
   "Vg12":0,
   "Vg23":0,
   "Vg31":0,
   "Ig1":0,
   "Ig2":0,
   "Ig3":0,
   "Mfrq":50.0,
   "Vm1":227,
   "Vm2":228,
   "Vm3":229,
   "Vm12":393,
   "Vm23":396,
   "Vm31":395,
   "MPF":0.00,
   "SRO":0.000,
   "VRO":50.0,
   "CPUT":33.6,
   "Unknown1":0,
   "GasP":0.01,
   "Mode":"AUT",
   "kWhour":13188243,
   "Runhrs":28187,
   "Numstr":3312,
   "Unknown2":122113663,
   "Unknown3":0.00,
   "OilLev":103,
   "OilT":45,
   "ThrPos":null,
   "CCPres":-0.01,
   "AirInT":29,
   "RecAT":36,
   "Unknown4":100,
   "ActPwr":0,
   "ActDem":0,
   "ActPfi":0,
   "CylA1":51,
   "CylA2":51,
   "CylA3":51,
   "CylA4":51,
   "CylA5":51,
   "CylA6":50,
   "CylB1":53,
   "CylB2":53,
   "CylB3":53,
   "CylB4":53,
   "CylB5":52,
   "CylB6":53,
   "JWTin":50,
   "JWTout":50,
   "JWGKin":36,
   "Unknown5":8211,
   "Unknown6":2,
   "CH4":0,
   "BIN":"1010001111000001"
}

or

{
   "TimeStamp":"2019-04-23T18:28:20.511Z",
   "AssetId":"28547896Z",
   "RPM":1000,
   "Pwr":100,
   "PF":1.00,
   "Gfrq":50.0,
   "Vg1":11000,
   "Vg2":10987,
   "Vg3":10785,
   "Vg12":0,
   "Vg23":0,
   "Vg31":0,
   "Ig1":0,
   "Ig2":0,
   "Ig3":0,
   "Mfrq":50.0,
   "Vm1":227,
   "Vm2":228,
   "Vm3":229,
   "Vm12":393,
   "Vm23":396,
   "Vm31":395,
   "MPF":0.00,
   "SRO":0.000,
   "VRO":50.0,
   "CPUT":33.6,
   "Unknown1":0,
   "GasP":0.01,
   "Mode":"AUT",
   "kWhour":13188243,
   "Runhrs":28187,
   "Numstr":3312,
   "Unknown2":122113663,
   "Unknown3":0.00,
   "OilLev":103,
   "OilT":45,
   "ThrPos":null,
   "CCPres":-0.01,
   "AirInT":29,
   "RecAT":36,
   "Unknown4":100,
   "ActPwr":0,
   "ActDem":0,
   "ActPfi":0,
   "CylA1":51,
   "CylA2":51,
   "CylA3":51,
   "CylA4":51,
   "CylA5":51,
   "CylA6":50,
   "CylA7":51,
   "CylA8":50,
   "CylB1":53,
   "CylB2":53,
   "CylB3":53,
   "CylB4":53,
   "CylB5":52,
   "CylB6":53,
   "CylB7":51,
   "CylB8":50,
   "JWTin":50,
   "CH4":0,
   "BIN":"1010001111000001"
}

We have over 300 IoT devices publishing this information at a minimum of every 10 minutes. Therefore, 144 records per day minimum.

What would be the best way to store / structure the dynamic JSON data?

Queries will be performed from a web reporting platformed and will be required to be fast.

Upvotes: 0

Views: 1312

Answers (2)

Conor Cunningham MSFT
Conor Cunningham MSFT

Reputation: 4501

With SQL Server/SQL Azure, you have the choice to either store the data as JSON (and you can add validation on top of it using CHECK constraints with JSON functions) or you can shred it into a more native schema format. Storing as nvarchar (TEXT is a deprecated older data type in SQL Server) is somewhat less efficient in terms of raw storage, but ultimately the trade-off you need to ask is whether you are just storing the data or if you have complex queries over it. If you have more complex queries, you'll eventually want to consider storing the data in a more native format where the QP can help you when you go to query the data later so that can be more efficient than scanning the data set every time.

I will recommend three possible paths for you.

  1. Keep it as JSON. You can add computed columns with JSON functions if you want to filter on specific fields. (This will be least efficient but least work)
  2. If you are always going to be getting lots of new attributes in your JSON, then another possible path is to shred the JSON (assuming it is flat as in your example) to sparse columns. This storage format is optimal for row formats where you have lots of nulls for some values on each row Sparse Columns
  3. If you know the columns up front, the most efficient way to store lots and lots of data is using columnstore indexes which you can read about here. These are highly compressed, efficient at doing aggregation operations, and you can add b-tree indexes on the clustered columnstore indexes to support some kinds of filtering. Partitioning them is also possible with the same ability to filter. (This is what most people who want an easy go-fast solution for lots of fact data are doing these days if they don't want to create a kimball-style DW schema).

Upvotes: 3

Jayasanka Weerasinghe
Jayasanka Weerasinghe

Reputation: 154

MySQL supports a native JSON data type which helps you to store JSON data in an optimal way rather than using TEXT datatype. Also, it comes with Automatic validation.

There are several MySQL functions to query or update json data.

The MySQL manual provides further information about the JSON data type and the associated JSON functions. MySQL :: The JSON Data Type

You can find a great tutorial on MySQL JSON Data fields here: How to Use JSON Data Fields in MySQL Databases

On the other hand, you can use a noSQL database. NoSQL databases easily store and combine any type of data. Massive amounts Of data can easily be handled by NoSQL models.

Upvotes: 1

Related Questions