Abner Escócio
Abner Escócio

Reputation: 2785

Is it a good idea to use a hash (sha1) as id for a firestore document?

My scenario is like follows:

  1. I'm using the BING news api and the return from the api is a list of the following object:
{
    "name": "Eterna Resenha contará com as participações de Neto e Vampeta",
    "url": "https://www.terra.com.br/esportes/lance/eterna-resenha-contara-com-as-participacoes-de-neto-e-vampeta,82e493e511734febfcdfda6fbd22c105xjafr9k2.html",
    "image": {
        "contentUrl": "http://p2.trrsf.com/image/fget/cf/800/450/middle/images.terra.com/2020/05/27/5ece8e302d1fb.jpeg",
        "thumbnail": {
            "contentUrl": "https://www.bing.com/th?id=ON.4E1CF6986982B70A3D6009F435822EF2&pid=News",
            "width": 700,
            "height": 393
        }
    },
    "description": "Durante a quarentena, as lives tomaram conta do país, tentando arrecadar doações para ajudar quem sofre com o coronavírus...",
    "provider": [
        {
            "_type": "Organization",
            "name": "Terra"
        }
    ],
    "datePublished": "2020-05-28T00:00:00.0000000Z",
    "category": "Entertainment"
}
  1. Note that there is no id field in this object, so I improvised an id by turning the datePublished field to Date and used the getTime method to return a long and then concatenated with the news language as follows:
const time = new Date(news.datePublished).getTime()
const id = `${language}${time}`

await database.collection(`news`).doc(`${id}`).set(news, { merge: true })
  1. This solution becomes inefficient when the same news is returned from the BING api with an updated date which causes the object to be duplicated in my firestore database.

The solution I plan to use

Transform the news url into a hash using the sha1 algorithm as follows:

const CryptoJS = require("crypto-js");
const id = `${CryptoJS.SHA1(news.url)}`

await database.collection(`news`).doc(`${id}`).set(news, { merge: true })

The firestore document creation best practices guide leaves scope for using ids in this format. But my main concern is with the performance with big id (d40e5b8df6462e138fe617a84ddabae7f78360a6) since I will have thousands of news in at least 5 languages.

Remeber: I need to create traceable IDs (based on some object property) because some news can be retrieved from BING news with the same content and the different datePublished, then I will need update them.

I would like to know if there are any counter points that make me choose another solution?

Upvotes: 2

Views: 1097

Answers (1)

Ogulcan Tümdoğan
Ogulcan Tümdoğan

Reputation: 111

You can use Firestore's default ID generator function. I am pretty sure "a big ID" won't cause a noticable performance issue, hence why Google is using such a function for generating unique IDs in their databases.

Here's the function I've extracted and been using for my projects for a long while:

        const generateId = function () {
        // Alphanumeric characters
        const chars =
            'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
        let autoId = '';
        for (let i = 0; i < 20; i++) {
            autoId += chars.charAt(Math.floor(Math.random() * chars.length));
        }
        return autoId;
    };

Probability of running into same ID for two documents is virtually impossible with this function, but you can go ahead and also add a timestamp to the result, just to ease your mind.

Upvotes: 1

Related Questions