user12384512
user12384512

Reputation: 3401

Data schema for storing large number of rows

I'm working on project with following domain structure:

  1. Users. There could be multiple users.
  2. Projects. Every user can create multiple projects.
  3. Keywords. Every project may contain a lot of keywords, up to 200,000 keywords. It's string up to 300 character.

If there will be at least 1000 users, each can have 10 projects, as result there will be 1000*10*200,200 keywords to be stored.

Use-cases:

  1. User upload 200,000 keywords at once. Insert should be done really fast.
  2. User delete a lot of keywords at once, based on searching query
  3. User update(rename) one keyword or few keywords
  4. User search over the keywords using wildcard search %%. Search can be done in Java memory if database does not support it.

Possible approaches:

  1. Single SQL table with proper indexes on projectId. I believe it can be very slow and frustration even with indexes.
  2. SQL table and partitioning based on userId for example. In this case not clear how hashing function should looks like.
  3. Serialize whole collection as blob and store it in the column of Projects table. Update even of the single row will result to serialization of whole collection.
  4. Use MongoDb(other NoSql database), store all data in all collection with proper indexes. Will it be faster than single SQL table ? Not sure
  5. Use NoSql, create new collection on the fly for every created project. MongoDb has restriction about 24000 namespaces per database.

What is the preferable database and table structure for storing such data ? I think that the best solution is 5.

Upvotes: 0

Views: 113

Answers (1)

Manish Kumar
Manish Kumar

Reputation: 589

Just give a try on solr or lucene search engine . Your problem seems to be more of searching . I have had same scenario and i have implemented it using solr in Java.

Upvotes: 1

Related Questions