Reputation: 1450
I am undertaking NoSQL document design for the below tables:
I have below tables where I am trying to do data modelling:
I have above tables in SQL server relational database where we maintain TaxAssignments for Items and scancodes, below is the sample data for TaxType & Location (Location table is self referential table with Fk_LocationId) table:
Below is the table for TaxAssignments:
I am trying to convert above SQL tables into NoSQL document DB, it's a one-to-may relationships between TaxType and TaxAssign, Location & TaxAssign most of the time from above table the queries are based on (FK_RetailItemCode or FK_TaxTypeCode), or by (ScanCode or FK_TaxTypeCode)
I want to design the document json, but it's been very hard for me to pick the partition key, ItemCode, ScanCode are queried a lot but they are optional fields so I cannot include them as part of partition key, so I picked UniqueIdentifer as partitionkey to spread out data into multiple logical partitions.
Did I pick the right key? When I query I don't query by UniqueIdentifier but by ItemCode or ScanCode with TaxType optional.
Below is my JSON document, are there any modifications or changes required in the design or should I take a different approach in order to design this:
{
"UniqueIdentifier": "1999-10-20-07.55.05.090087",
"EffectiveDate": "1999-10-20",
"TerminationDate": "9999-12-31",
"LocationId": 1,
"FK_RetailItemCode": 852874,
"FK_TaxTypeCode": 1,
"TaxType": [
{
"TaxTypeCode": 1,
"TaxArea": "STATE ",
"Description": "SALES TAX ",
"IsPOSTaxEnabled": "Y",
"POSTaxField": 1,
"TaxOrder": 0,
"IsCityLimit": " ",
"IsTaxFree": false,
"Location": [
{
"LocationId": 1,
"LocationType": "ST",
"City": " ",
"County": " ",
"Country": "USA ",
"CountyCode": 0,
"State": "ALABAMA ",
"StateShortName": "AL",
"SortSequence": 40
}
]
}
]
},
{
"UniqueIdentifier": "2019-06-13-08.51.48.004124",
"EffectiveDate": "2019-06-13",
"TerminationDate": "2019-08-05",
"LocationId": 13531,
"FK_RetailItemCode": 852784,
"FK_TaxTypeCode": 16,
"TaxType": [
{
"TaxTypeCode": 16,
"TaxArea": "CITY ",
"Description": "HOSPTLY TAX OUT CITY LIM ",
"IsPOSTaxEnabled": "Y",
"POSTaxField": 2,
"TaxOrder": 1,
"IsCityLimit": "N",
"IsTaxFree": false,
"Location": [
{
"LocationId": 13531,
"LocationType": "CI",
"City": "FOLEY ",
"County": "BALDWIN ",
"Country": "USA ",
"CountyCode": 2,
"State": "ALABAMA ",
"StateShortName": "AL",
"FK_LocationId": 13510
}
]
}
]
}
TaxAssignment is a huge table with 6 millon data, so I want to spread data as much as I can so I picked UniqueIdentifier as partition key, I couldn't pick the other partition key which are queried so often as that columns ItemCode & ScanCode are optional (nullable).
Questions:
TaxType --> number of records is 19
Location --> number of records is 38000
TaxAssign --> number of records is 6 million.
Upvotes: 0
Views: 779
Reputation: 8763
It's rather difficult to answer questions like this and I don't think I can fully answer yours here but I can provide some advice and also point you to resources to help you figure this out yourself.
For 1:few relationship you typically will want to embed this. TaxType and LocationType sound like good candidate to embed. However you will want to maintain these in their own container and use ChangeFeed to update them in your TaxAssignments table.
For 1:Many relationships, especially if they are unbounded you typically will want to reference this. Not sure if Locations is unbounded in your app. If it isn't then embed this too.
Typically you never want to pick a partition key that is never used in a query. If you do then EVERY query will be cross partition so the larger your container gets, the worse your app will perform. Basically it will not scale.
If you have some queries that use one property for queries and other queries that use another property, one option would be to use ChangeFeed and keep duplicate copies of the data optimized for those queries. However, you will want to measure the cost of these queries and multiply that by the number of times a month the query is run, then calculate the cost for using Change Feed. Change Feed consumes 2 RU/s each second to poll your container, then 1 RU/s for each 1Kb of less of data read and ~8 RU/s for each insert into the target container depending on your index policy. Multiply that by the number of new/updated records per month and you have a decent estimate.
If you want more details on how to design this type of database feel free to check out these links.
Modeling and partitioning session from Ignite 2019
GitHub Repo that shows the demos run for that session but in .NET
Doc on modeling and partitioning
Upvotes: 1