Reputation: 15221
I have a dilema about choosing best (syntetic) value for partition key for storing user data.
User document has: - id (guid) - email (used to login, e.g.) - profile data
There are 2 main types of queries:
id
(most queries)email
(login and some admin queries)I want to avoid cross partition queries.
If i choose id
for partitionKey
(synthetic field) then login queries would be cross partition.
On the other hand, if i choose email
then if user ever changes email - its a problem.
What i am thinking is to introduce new type within the collection. Something like:
userId: guid,
userEmail: “email1”,
partitonKey: “users-mappings”
then i can have User
document itself as:
id: someguid,
type: “user”,
partitionKey: “user_someguid”,
profileData: {}
that way when user logs in, i first check mappings type/partition by email
, get guid
and then check actual User
document by guid
.
also, this way email can be changed without affecting partitioning.
is this a valid approach? any problems with it? am i missing something?
Upvotes: 3
Views: 1628
Reputation: 41
As you already know, in querying Cosmos DB, Fan-out should be the last option to query, especially on such a high-volume action such as logging in. Plus, the cost in RUs will be significantly higher with large data.
In the Cosmos DB SQL API, one pattern is to use synthetic partition keys. You can compose a synthetic partition key by concatenating the id and the email on write. This pattern works for a myriad of query scenarios providing flexibility.
Something like this:
{
"id": "123",
"email":"[email protected]",
"partitionKey":"[email protected]"
}
Then on read, do something like this:
SELECT s.something
FROM s
WHERE STARTSWITH(s.partitionKey, "123")
OR
ENDSWITH(s.partitionKey, "[email protected]")
You can also use SUBSTRING() etc...
With the above approach, you can search for a user either by their id or email and still use the efficiency of a partition key, minimizing your query RU cost and optimizing performance.
Upvotes: 1
Reputation: 23782
Your question does not has a standard answer. In my opinion, you solution named mapping type
causes two queries which is also inefficient. Choosing partition key is always a process of balancing the pros and cons.Please see the guidance from official document.
Based on your description:
1.Looking for user by id (most queries)
2.Looking for user by email (login and some admin queries)
I suggest you to prioritize the most frequent queries, that is to say, id
.
My reason:
1.id won't change easily,is relatively stable.
2.Session or cookie can be saved after login, so there is not much accesses to login as same as id.
3.id is your most frequent query condition, so it's impossible to cross all partitions every time.
4.If you do concern about login performance,don't forget adding indexing policy for email
column.It could also improve the performance.
Upvotes: 1