joshpj1
joshpj1

Reputation: 186

High read/write data storage

I'm designing a short-linking service. When people click one of my shortlinks I want to collect some data like ip address, useragent etc and then forward them on their destination. What would be the best way to store this data if my links table grows into the tens of millions. I'm not sure whether to use sql of something like elastic search.

Upvotes: 1

Views: 259

Answers (1)

Eirini Graonidou
Eirini Graonidou

Reputation: 1566

This is a quite opinion-based question, but I will try to answer. The performance of your operations depends mainly on what kind of queries you are going to have. So the real question is what do you want to do with this data ? Some concepts, when dealing with a big amount of data follows:

Bulk Insert

In case you have to save a big amount of data records with one request, both RDBMSs and elasticsearch offer you the means to achieve that. (postgreSQL populate data, elasticsearch Bulk API)

Data Partitioning

If you deal with a huge amount of data, that is constantly getting bigger, the execution time of the queries can grow as the amount of data grows. At some point, you might realize, that you need to apply data partitioning.

With elasticsearch you can create time-based-indices: you can save this "traffic-analytics" into indices like 2018-03-traffic, 2018-04-traffic, etc.. Then you could refer to them under one name using aliases. Please refer to what-are-aliases-in-elasticsearch-for question. Postgres, that also offers you the means for table partitioning

So far so good, let's see some other aspects:

Data Structure

  1. Does your schema consist of strong predefined, complcated rules?
    If not(and I think that is your case), you could use elasticsearch.

  2. Would you need in the future to add/remove fields into/from your existing schema?
    Elasticsearch is more flexible accepting new fields in an existing index-you actually don't need to do anything-, where in a RDMS you should manage it yourself-aka update the table definition.

Opinion based conclusion

The above described assumption and the assumption, that you would like at some point to run data analytics and visualize them, leads me to conclude, that elasticsearch could be a better fit in your case. With kibana, you get this out of the box.

Notes:
1. I use PostgreSQL for the given RDBMS links, because I am familiar with that.
2. You should also consider the Scalability of RDBMS vs elasticsearch.

Upvotes: 1

Related Questions