Reputation: 559
I currently have a database for articles that keeps track of the most read article for a certain amount of time by incrementing the "visits" counter on page_load. The current "visits" counter is a column in the articles
table (see below):
id | title | description | visits | creation_date
---+--------+-------------+--------+-----------------
1 | test1 | test test.. | 10 | 2019-01-01
2 | test2 | test test.. | 20 | 2019-01-01
Sometimes, I experienced connection timeouts and I suspected a deadlock from the "visits" write procedure (database locks if concurrent users were incrementing the same row at once). I thought of the below scenario as an enhancement:
Visits
counter from the table Articles
article_visits
with two columns: article_id
and date
Articles
id | title | desc | creation_date
---+-------+------+---------------
1 | test1 | desd | 2019-01-01
2 | test1 | desd | 2019-01-01
article_visits
article_id | visit_date
-----------+----------------------
1 | 2019-01-01
1 | 2019-01-01
1 | 2019-01-01
1 | 2019-01-01
1 | 2019-01-01
1 | 2019-01-01
2 | 2019-01-01
2 | 2019-01-01
2 | 2019-01-01
As an alternative option, once triggering a new visit, I insert a new row into the articles_visits
table to avoid any deadlocks on the articles
table. This solution will make the articles_visits
table grow big very quickly but I don't think table size is a problem.
I would like to know if this is the proper way to log article visits and if the optimization if is a better option than the original solution.
Upvotes: 2
Views: 444
Reputation: 5094
Current Articles
table is not in Normalized form
.
I will say putting visits
column in Articles
table is not proper way of
De-Normalization
.
Current Articles
table is not only giving you deadlock issue but also you cannot get so many other type of Report.
Daily Visit Report, Weekly Visit Report
.
Creating Article_visits
table is very good move .
It will be very frequently updated.
My Article_visits
design
article_visit_id | article_id | visit_date | visit_count
-----------------+--------------+----------------------+----------------------
1 | 1 | 2019-01-01 | 6
2 | 2 | 2019-01-01 | 3
Here Article_Visit_id
is int identity(1,1)
which is also Clustered Index
.
Create NonClustered Index NCI_Articleid_date ON Article_visits(article_id,visit_date)
GO
In short ,creating CI on article_id,visit_date
will expensive affair.
If record do not exists for that article
on that date then insert with visit_count
1
if it exists then update visit_count
i.e. increase by 1.
Indexed View
.Actual Table Design,
Create Table Article(Articleid int identity(1,1) primary key
,title varchar(100) not null,Descriptions varchar(max) not null
,CreationDate Datetime2(0))
GO
Create Table Article_Visit(Article_VisitID int identity(1,1) primary key,Articleid int not null ,Visit_Date datetime2(0) not null,Visit_Count int not null)
GO
--Create Trusted FK
ALTER TABLE Article_Visit
WITH NOCHECK
ADD CONSTRAINT FK_Articleid FOREIGN KEY(Articleid)
REFERENCES Article(Articleid) NOT FOR REPLICATION;
GO
--Create NonClustered Index NCI_Articleid_Date on
-- Article_Visit(Articleid,Visit_Date)
--Go
Create NonClustered Index NCI_Articleid_Date1 on
Article_Visit(Visit_Date)include(Articleid)
Go
Create Trusted FK to get Index Seek Benefit (in short).
I think ,NCI_Articleid_Date
is no more require because ofArticleid
being Trusted FK
.
Deadlock Issue
: Trusted FK
was also created to overcome Deadlock issue.
It often occur due to bad Application code
or UN-Optimized Sql query
or Bad Table Design
.Beside this also there several other valid reason,like handling Race Condition
.It is quite DBA thing.If deadlock is hurting too much then after addressing above reason, you may have to Isolation Level
.
Many Deadlock issue are auto handle by Sql server itself.
There are so many article online on DEADLOCK REASON.
I don't think table size is a problem
Table size
are big issue.Chances of Deadlock
in both design are very very less.But you will always face other demerit
of Big Size
table.
I am telling you to read few more article.
I hope that this is your exactly same real table with same data type ?
How frequently both table will inserted/updated ?
Which table will be query more frequently ?
Concurrent use of each table.
Deadlock can be only minimize so that there is no performance issue or transaction issue.
What is relation between
Visitorid
andArtcileid
?
Upvotes: 0
Reputation: 416081
This is certainly valid, though you may want to do some scoping on how much additional storage and memory load this will require for your database server.
Additionally, I might add a full datetime
or datetime2
column for the actual timestamp (in addition to the current date column rather than instead of it, since you'll want to do aggregation by date only and having that value pre-computed can improve performance), and perhaps a few other columns such as IP Address and Referrer. Then you can use this data for additional purposes, such as auditing, tracking referrer/advertiser ROI, etc.
Upvotes: 2
Reputation: 70538
I'm interested to understand why you are getting a dead lock. It should be the case that a db platform should be able to handle a update tablename set field = field + 1
concurrently just fine. Here the table or row will lock and then release but the time should not be long enough to cause a deadlock error.
YOU COULD get a deadlock error if you are updating or locking more than one table with a transaction accross multiple tables esp. if you do them in a different order.
So the question is... in your original code are you linking to multiple tables when you do the update statement? The solution could be as simple as making your update atomic to one table.
However, I do agree -- the table you describe is a more functional design.
Upvotes: 0
Reputation: 1270663
This is a fine way to record article visits. It is much less (or not at all) prone to deadlocks, because you are basically just appending new rows.
It is more flexible. You can get the number of visits between two dates, for instance. And that can be defined at query time. You can store the exact time, so determine if there are time preferences for views.
The downside is performance on querying. If you frequently need the counts, then the calculation can be expensive.
If this is an issue, there are multiple possible approaches:
Upvotes: 2