Reputation: 2056
I've been trying to teach myself some basic SQL to implement into an application I'm creating using database management.
Quickly after implementation, I realised that I wasn't sure how to deal with duplicate entries. Looking online it seems that I need to assign a 'Unique' constraint to each piece of data that is likely to be duplicated, but I wanted to know if I could approach this in a different way.
In conclusion, I've come up with a different idea in which to approach this issue, I'm not sure how to write this in SQL or even if it's a feasible idea.
I'd like some outside input to either help me accomplish this or to disprove my idea.
Using the dart language I can seek all duplicate entries, and how many duplicates are present. This is what I'm using to get this result:
SELECT `title`, COUNT(*) FROM `Article` GROUP BY `url` HAVING COUNT(*) > 1
And this is the result
flutter: [{title: US Stock Market Chalks up Huge Weekly Losses; Bitcoin, DXY Impress, COUNT(*): 2}, {title: What Triggered Ethereum to Rise 66% Within 1 Week? Factors and Trends, COUNT(*): 26}, {title: What’s Next for Crypto Miners?, COUNT(*): 2}]
Could I perhaps delete all duplicate results by the amount returned by
COUNT(*)
minus one?This would mean that if I had 26 duplicates of
What Triggered Ethereum to Rise 66% Within 1 Week? Factors and Trends
, I could delete the number represented byCOUNT(*)
, being 26 but before deletion minus 1 from the number 26, effectively leaving me with one duplicate left over? (being the desired result of using the aforementioned 'Unique' constraint)
Edit: This is the SQL line I use to input a new row of data to my database.
'INSERT INTO Article(title, description, url, urltoimage, publishedat) VALUES("$title", "$description", "$url", "$urlToImage", "$publishedAt")'
Upvotes: 1
Views: 280
Reputation: 57043
You could delete all bar 1 entry an a per url basis - assuming that the url column should be unique - if a combination of more than just the url column or another column simply change the first two WHERE clauses. - the assumption is also made that the table is not a WITHOUT ROWID table.
:-
SELECT count() FROM article WHERE url = 'myurl';
WITH cte AS (
SELECT rowid AS thisrowid
FROM article
WHERE url = 'myurl'
ORDER by url
LIMIT (SELECT count(*)-1 FROM article WHERE url = 'myurl')
)
DELETE FROM article WHERE rowid IN (SELECT thisrowid FROM cte);
This uses a CTE (Common Table Expression (temporary table)) to extract the rowid (a special normally hidden column that always exists unless WITHOUT ROWID is specified) from all bar 1 row of the article table according to the selection criteria (WHERE clause) (in this case where the url column contains myurl). That is LIMIT is used and this is the number of rows according to the same WHERE clause less 1.
This CTE, a slist of the rowids is then used as the basis for the DELETE's WHERE clause.
The above was tested using the following SQL :-
DROP TABLE IF EXISTS article;
CREATE TABLE IF NOT EXISTS article (title TEXT, description TEXT, url TEXT, urltoimage TEXT, publisheddat);
INSERT INTO article VALUES
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01')
;
SELECT count() FROM article WHERE url = 'myurl';
WITH cte AS (
SELECT rowid
FROM article
WHERE url = 'myurl'
ORDER by url
LIMIT (SELECT count(*)-1 FROM article WHERE url = 'myurl')
)
-- SELECT * FROM cte;
DELETE FROM article WHERE rowid IN (SELECT * FROM cte);
SELECT count() FROM article WHERE url = 'myurl';
Number of rows before the deletion :-
number of remaining rows :-
Saying that it would probably be as easy to
The code could be (again assuming url column needs to be unique) :-
DROP TABLE IF EXISTS article_new;
DROP TABLE IF EXISTS article_old;
CREATE TABLE IF NOT EXISTS article_new (title TEXT, description TEXT, url TEXT UNIQUE, urltoimage TEXT, publisheddat);
INSERT OR IGNORE INTO article_new SELECT * FROM article;
ALTER TABLE article RENAME TO article_old;
ALTER TABLE article_new RENAME TO article;
DROP TABLE IF EXISTS article_old;
DROP TABLE IF EXISTS article;
CREATE TABLE IF NOT EXISTS article (title TEXT, description TEXT, url TEXT, urltoimage TEXT, publisheddat);
INSERT INTO article VALUES
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl','myimageurl','2018-01-01'),
('Title001','Description001','myurl2','myimageurl','2018-01-01'),
('Title001','Description001','myurl3','myimageurl','2018-01-01'),
('Title001','Description001','myurl3','myimageurl','2018-01-01'),
('Title001','Description001','myurl3','myimageurl','2018-01-01'),
('Title001','Description001','myurl3','myimageurl','2018-01-01'),
('Title001','Description001','myurl3','myimageurl','2018-01-01')
;
DROP TABLE IF EXISTS article_new;
DROP TABLE IF EXISTS article_old;
CREATE TABLE IF NOT EXISTS article_new (title TEXT, description TEXT, url TEXT UNIQUE, urltoimage TEXT, publisheddat);
INSERT OR IGNORE INTO article_new SELECT * FROM article;
ALTER TABLE article RENAME TO article_old;
ALTER TABLE article_new RENAME TO article;
DROP TABLE IF EXISTS article_old;
SELECT * FROM article;
Upvotes: 1