Reputation: 145
Hello Im searching for an advice concerning the Database (mysql) of my Webcrawler project (in java).
I am collecting information about different channels (about 2000) on twitter, some have many tweets, some less. In summary I only need to store the channel information (name, id) and some basic information on each tweet (id, time, likes, retweets).
My first idea was to have a Table "channels" having a row for each channel, and in each row having a BLOB field "tweets" containing all tweets (from 10 to 10000). Now this version is a little unflexible, when querying single tweets or a specific timespan. But when having a separate table for all tweets, I'm afraid of performance issues, i. e. searching for 200 elements in a list of about 1000000 tweets..
How would I solve this situation? I dont have the capacities to test both situations for performance, thats why I'm asking :/
Thank you for reading!
Upvotes: 0
Views: 49
Reputation: 39907
Premature optimization is the root of all evil.
Go with a separate table.
Upvotes: 0
Reputation: 1953
Two tables
channels: id | channelId | name
tweets: id | tweetId | channelId | time | likes | retweets
The id is the internal database id from your database (not the tweet or channel id). The channelId from the tweets table refers to the internal database id for the channel it belongs to (foreign key)
For your performance concerns: How many rows in a database are TOO MANY?
Upvotes: 1