Reputation: 1263
I am a Java developer. I want to know what is the best way to store huge data into mysql using Java.
Huge: two hundred thousand talk messages every second.
An index is not needed here
Should I store the messages into the database as soon as the user creates them? Will it be too slow?
Upvotes: 3
Views: 2791
Reputation: 5684
I guess, typical access would involve retrieving all text of one chat session at least.
The number of rows is large and your data is not so much relational. This is a good fit for Non-Relational database.
If you still want to go with MySQL, use Partitions. While writing, use batch inserts and while reading provide sufficient Partition pruning hints in your queries. Use EXPLAIN PARTITIONS
to check whether partitions are being pruned. In this case I would strongly recommend that you combine chat lines of a one chat session into a single row. This will dramatically reduce the number of rows as compared to one chat line per row.
You didn't mention how many many days of data you want to store.
On a separate note: How successful would your app have to be in terms of users to require 200k messages per second? An active chat session may generate about 1 message every 5 seconds from one user. For ease of calculation lets make it 1 second. So you are building capacity for 200K online users. Which implies you would at least have a few million users.
It is good to think of scale early. However, it requires engineering effort. And since resources are limited, allocate them carefully for each task (Performance/UX etc). Spending more time on UX, for example, may yield a better ROI. When you get to multi-million user territory, new doors will open. You might be funded by an Angel or VC. Think of it as a good problem to have.
My 2 cents.
Upvotes: 0
Reputation: 2641
My suggestion is also MongoDB. Since NoSQL paradigm fits your needs perfectly. Below is a flavor of MongoDB in Java -
BasicDBObject document = new BasicDBObject();
document.put("database", "mkyongDB");
document.put("table", "hosting");
BasicDBObject documentDetail = new BasicDBObject();
documentDetail.put("records", "99");
documentDetail.put("index", "vps_index1");
documentDetail.put("active", "true");
document.put("detail", documentDetail);
collection.insert(document);
This tutorial is for good to get started. You can download MongoDB from github.
For optimization of MongoDB please refer this post.
Upvotes: 1
Reputation: 57192
There are at least 2 different parts to this problem:
Processing the messages for storage in the database
What type of storage to use for the message
For processing the messages, you're likely going to need a horizontally scalable system (meaning you can add more machines to process the messages quickly) so you don't accumulate a huge backlog of messages. You should definitely not try to write these messages synchronously, but rather when a message is received, put it on a queue to be processed for writing to the database (something like JMS comes to mind here).
In terms of data storage, MySQL is a relational database, but it doesn't sound like you are really doing any relational data processing, rather just storing a large amount of data. I would suggest looking into a NoSQL database (as others have suggested here as well) such as MongoDB, Cassandra, CouchDB, etc. They each have their strengths and weaknesses (you can read more about each of them on their respective websites and elsewhere on the internet).
Upvotes: 0
Reputation: 7212
1 billion writes / day is about 12k / second. Assuming each message is about 16 bytes, that's about 200k / sec. If you don't care about reading, you can easily write this to disk at this rate, maybe one message per line. Your read access pattern is probably going to dictate what you end up needing to do here.
If you use MySQL, I'd suggest combining multiple messages per row, if possible. Partitioning the table would be helpful to keep the working set in memory, and you'll want to commit a number of records per transaction, maybe 1000 rows. You'll need to do some testing and tuning, and this page will be helpful:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
You should probably also look at Cassandra which is written with heavy write workloads in mind.
Upvotes: 1
Reputation: 4829
Do you have to absolutely use MySQL or Are you open to other DBs as well? MongoDb or CouchDB will be a good fit for these kind of needs. Check them out if you are open to other DB options.
If you have to go absolutely with MySql, then we have done something similar all the related text messages go in a child as single json. We append to it every time and we keep master in a separate table. So one master and one child record at the minimum and more child records as the messages go beyond certain number ( 30 in our scenario) , implemented kind of "load more.." queries second child record which holds 30 more.
Hope this helps.
FYI, we are migrating to CouchDB for some other reasons and needs.
Upvotes: 0