Reputation: 15221

Cosmos db user id/email as partition key

I have a dilema about choosing best (syntetic) value for partition key for storing user data.

User document has: - id (guid) - email (used to login, e.g.) - profile data

There are 2 main types of queries:

Looking for user by id (most queries)
Looking for user by email (login and some admin queries)

I want to avoid cross partition queries.

If i choose id for partitionKey (synthetic field) then login queries would be cross partition. On the other hand, if i choose email then if user ever changes email - its a problem.

What i am thinking is to introduce new type within the collection. Something like:

userId: guid,
userEmail: “email1”,
partitonKey: “users-mappings”

then i can have User document itself as:

id: someguid,
type: “user”,
partitionKey: “user_someguid”,
profileData: {}

that way when user logs in, i first check mappings type/partition by email, get guid and then check actual User document by guid.

also, this way email can be changed without affecting partitioning.

is this a valid approach? any problems with it? am i missing something?

Upvotes: 3

Answers (2)

Ron Norman

Reputation: 41

As you already know, in querying Cosmos DB, Fan-out should be the last option to query, especially on such a high-volume action such as logging in. Plus, the cost in RUs will be significantly higher with large data.

In the Cosmos DB SQL API, one pattern is to use synthetic partition keys. You can compose a synthetic partition key by concatenating the id and the email on write. This pattern works for a myriad of query scenarios providing flexibility.

Something like this:

{
  "id": "123",
  "email":"[email protected]",
  "partitionKey":"[email protected]"
}

Then on read, do something like this:

SELECT  s.something
FROM    s
WHERE   STARTSWITH(s.partitionKey, "123")
        OR
        ENDSWITH(s.partitionKey, "[email protected]")

You can also use SUBSTRING() etc...

With the above approach, you can search for a user either by their id or email and still use the efficiency of a partition key, minimizing your query RU cost and optimizing performance.

Upvotes: 1

Jay Gong

Reputation: 23782

Your question does not has a standard answer. In my opinion, you solution named mapping type causes two queries which is also inefficient. Choosing partition key is always a process of balancing the pros and cons.Please see the guidance from official document.

Based on your description:

1.Looking for user by id (most queries)

2.Looking for user by email (login and some admin queries)

I suggest you to prioritize the most frequent queries, that is to say, id.

My reason:

1.id won't change easily,is relatively stable.

2.Session or cookie can be saved after login, so there is not much accesses to login as same as id.

3.id is your most frequent query condition, so it's impossible to cross all partitions every time.

4.If you do concern about login performance,don't forget adding indexing policy for email column.It could also improve the performance.

Upvotes: 1

Cosmos db user id/email as partition key

Answers (2)

Related Questions