Reputation: 47
I am making an application that will process a big chunk of information. This information was retrieved using some web crawlers and is about news, containing data such as News Title, URL, Publication Date, Category and Content. The crawled data is in XML format, and I will load that information into my application.
From there, 10 users will process every news and tag the category of the news manually by reading the title. I've used 9 main categories to be used, and for each news the users will decide from 0-5 how much this particular news belongs to each category.
Users will also search for news by title and decide if this news is discussing the same event as another news, or if this news is similar to another news (like a news about a football game at 3 PM and a news about a football game at 5 PM).
I have no problems with making the application itself, I just need some help with the part of how to design one/many tables that can link news that talk about similar events or about the same event, since there can be many many news that can talk about the same event.
So far I've done something like this:
Table News
ID
Title
URL
PublicationDate
NewsContent
Table Category
NewsID
User_ID
Economy
Politics
Present_Day
Sport
Technology
Showbiz
Culture
Region
World
Table User
ID
FirstName
LastName
Each category field in the Category table holds a tinyint data type (I'm using SQL Server) and I've restricted the values the user can input via a check constraint so that it allows values from 0 - 5. I don't know if this is the right approach so far for the database design, and I need to add the table/s that allow to add information about similar news or news that talk about the same event, such as a table Similar_News containing some fields like News_ID, SimilarNews1_ID, SimilarNews2_ID and so on, and must contain the information about which user 'said this', but this sounds like a flawed design to me.
Any help is appreciated, thank you.
Upvotes: 1
Views: 873
Reputation: 1088
Here are some suggestions. In the CATEGORY table you have created 9 different columns (economy, politics etc). What if a few days / months / years down the line there is a new category. In this case you will have to modify your database design. Instead, you could have the following structure of your CATEGORY table.
CategoryId
Category
And have one more table to store the actual news categorization.
Table: NewsCategory
NewsId
CategoryId
CategoryWeight (This will store the rating from 1-5)
If the user feels that a news does not belong to a particular category then no row will be inserted in this table for that category. Such a structure will give you more flexibility to insert new categories in future without changing the database design. You just have to insert new rows in the Category table.
For storing similar news I would recommend one of the following approach.
Create a table event and store the details of the event in this table.
EventId
EventDescription
Sample Data
EventId: 55
EventDescription: Euro 2016 Belgium vs Italy
Now you can include this EventId in your News table. This way you can pull up all the news that are related to this event.
Upvotes: 1
Reputation: 821
Create Similar_News table with n fields to store the similar News_Id doesn't sound a good idea. How many fields would you create? 3? 10?
You are modeling a n to m relationship so I would only use two fields (three if you wanna store the User Id). For example if New 1 is similar to news 2 and 3. And New 3 is similar to 4, you insert the rows:
New_ID SImilar_New_Id
--------------------------
1 2
1 3
3 4
Another approach would be use a NOSQL DB to store flexible structures, for example:
"News":{
"User_Id":1,
"Category"{
"Economy":3,
"Politics":4
},
"Similar_News":[1,2,3]
}
Upvotes: 0