Jake
Jake

Reputation: 2056

SQL - Deleting multiple entries

I've been trying to teach myself some basic SQL to implement into an application I'm creating using database management.

Quickly after implementation, I realised that I wasn't sure how to deal with duplicate entries. Looking online it seems that I need to assign a 'Unique' constraint to each piece of data that is likely to be duplicated, but I wanted to know if I could approach this in a different way.

In conclusion, I've come up with a different idea in which to approach this issue, I'm not sure how to write this in SQL or even if it's a feasible idea.

I'd like some outside input to either help me accomplish this or to disprove my idea.

Using the dart language I can seek all duplicate entries, and how many duplicates are present. This is what I'm using to get this result:

SELECT `title`, COUNT(*) FROM `Article` GROUP BY `url` HAVING COUNT(*) > 1

And this is the result

flutter: [{title: US Stock Market Chalks up Huge Weekly Losses; Bitcoin, DXY Impress, COUNT(*): 2}, {title: What Triggered Ethereum to Rise 66% Within 1 Week? Factors and Trends, COUNT(*): 26}, {title: What’s Next for Crypto Miners?, COUNT(*): 2}]

Could I perhaps delete all duplicate results by the amount returned by COUNT(*) minus one?

This would mean that if I had 26 duplicates of What Triggered Ethereum to Rise 66% Within 1 Week? Factors and Trends, I could delete the number represented by COUNT(*), being 26 but before deletion minus 1 from the number 26, effectively leaving me with one duplicate left over? (being the desired result of using the aforementioned 'Unique' constraint)

Edit: This is the SQL line I use to input a new row of data to my database.

'INSERT INTO Article(title, description, url, urltoimage, publishedat) VALUES("$title", "$description", "$url", "$urlToImage", "$publishedAt")'

Upvotes: 1

Views: 280

Answers (1)

MikeT
MikeT

Reputation: 57043

You could delete all bar 1 entry an a per url basis - assuming that the url column should be unique - if a combination of more than just the url column or another column simply change the first two WHERE clauses. - the assumption is also made that the table is not a WITHOUT ROWID table.

:-

SELECT count() FROM article WHERE url = 'myurl';
WITH cte AS (
    SELECT rowid AS thisrowid
    FROM article 
    WHERE url = 'myurl'
    ORDER by url
    LIMIT (SELECT count(*)-1  FROM article WHERE url = 'myurl')
)
DELETE FROM article WHERE rowid IN (SELECT thisrowid FROM cte);

This uses a CTE (Common Table Expression (temporary table)) to extract the rowid (a special normally hidden column that always exists unless WITHOUT ROWID is specified) from all bar 1 row of the article table according to the selection criteria (WHERE clause) (in this case where the url column contains myurl). That is LIMIT is used and this is the number of rows according to the same WHERE clause less 1.

This CTE, a slist of the rowids is then used as the basis for the DELETE's WHERE clause.

Working Example SQL

The above was tested using the following SQL :-

DROP TABLE IF EXISTS article;
CREATE TABLE IF NOT EXISTS article (title TEXT, description TEXT, url TEXT, urltoimage TEXT, publisheddat);
INSERT INTO article VALUES
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01')
;
SELECT count() FROM article WHERE url = 'myurl';
WITH cte AS (
    SELECT rowid
    FROM article 
    WHERE url = 'myurl'
    ORDER by url
    LIMIT (SELECT count(*)-1  FROM article WHERE url = 'myurl')
)
-- SELECT * FROM cte;
DELETE FROM article WHERE rowid IN (SELECT * FROM cte);
SELECT count() FROM article WHERE url = 'myurl';

The Results from the two queries being :-

Number of rows before the deletion :-

enter image description here

number of remaining rows :-

enter image description here


Alternative simpler and fixes issue re-occurring

Saying that it would probably be as easy to

  1. create a new table e.g. article_new table including the UNIQUE constraint but otherwise identical and
  2. then populate this from the existing table (using INSERT OR IGNORE to not fail when dropping the duplicates).
  3. then rename the original table (could drop it but safer not to) e.g. to article_old
  4. then rename the new table to the original table name
  5. lastly drop the renamed (if happy to do so).

The code could be (again assuming url column needs to be unique) :-

DROP TABLE IF EXISTS article_new;
DROP TABLE IF EXISTS article_old;
CREATE TABLE IF NOT EXISTS article_new (title TEXT, description TEXT, url TEXT UNIQUE, urltoimage TEXT, publisheddat);
INSERT OR IGNORE INTO article_new SELECT * FROM article;
ALTER TABLE article RENAME TO article_old;
ALTER TABLE article_new RENAME TO article;
DROP TABLE IF EXISTS article_old;

Working Example (note more rows added with 3 different url columns) :-

DROP TABLE IF EXISTS article;
CREATE TABLE IF NOT EXISTS article (title TEXT, description TEXT, url TEXT, urltoimage TEXT, publisheddat);
INSERT INTO article VALUES
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl2','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl3','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl3','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl3','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl3','myimageurl','2018-01-01'),
    ('Title001','Description001','myurl3','myimageurl','2018-01-01')
;

DROP TABLE IF EXISTS article_new;
DROP TABLE IF EXISTS article_old;
CREATE TABLE IF NOT EXISTS article_new (title TEXT, description TEXT, url TEXT UNIQUE, urltoimage TEXT, publisheddat);
INSERT OR IGNORE INTO article_new SELECT * FROM article;
ALTER TABLE article RENAME TO article_old;
ALTER TABLE article_new RENAME TO article;
DROP TABLE IF EXISTS article_old;
SELECT * FROM article;

The result :-

enter image description here

Upvotes: 1

Related Questions