Brans Ds
Brans Ds

Reputation: 4117

Extremely long "Not In" request

I have two tables:

  1. "Sessions" - it have int key identity, "session_id" - varchar, "device_category" - varchar and some other colums. There are 149239 rows.

  2. Session_events" - it have int key identity, "session_id" - uniqueidentifier and some other fields. There are 3140768 rows there.

This tables has been imported from not relational database - Cassandra, so I not created any connections in MS SQL Server designer. But real connection between Sessions and Session_events on column session_id is Many-To-Many

Now I want to delete all web-sessions that was not take place on Personal Computer "device_category". So I run request Delete * FROM sessions where device_category != "PC" that was fast. Now I want to to delete all not PC sessions from Session_events table. So I run request

Delete FROM session_events where session_id Not In (SELECT distinct session_id FROM sessions)

That request is currently running for more then 24 hour and I don't know how long it can take...

(I have 16 GB ram and Intel Xenon).

I know that Left Join can be faster but 20% is not interesting. Do you see the way to finish my task much faster?

----
CREATE TABLE [dbo].[session_events](
    [key] [bigint] IDENTITY(1,1) NOT NULL,
    [session_id] [uniqueidentifier](max) NULL,
    [visitor_id] [uniqueidentifier] NULL,
    [shipping_method] [varchar](max) NULL,
    [shipping_price] [varchar](max) NULL,
    [site_id] [int] NULL,
    [stream_data_chunk] [varbinary](max) NULL,
    [total] [varchar](max) NULL,
    [total_inc_tax] [varchar](max) NULL,
    [tracker_ver] [varchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

CREATE TABLE [dbo].[visitor_sessions](
    [key] [int] IDENTITY(1,1) NOT NULL,
    [visitor_id] [varchar](max) NULL,
    [created] [varchar](max) NULL,
    [session_id] [varchar](max) NULL
)

 CONSTRAINT [PK_visitor_sessions4] PRIMARY KEY CLUSTERED 
(
    [key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Upvotes: 1

Views: 105

Answers (5)

kutsoff
kutsoff

Reputation: 345

try this code

delete e
from session_events e 
left join sessions s (nolock)
    on e.session_id = s.session_id
where s.session_id is null

Upvotes: -2

Pavel Gatnar
Pavel Gatnar

Reputation: 4053

Sometimes the problem by delete is it is waiting to acquire the lock for all relevant rows. Try to delete in a loop.

DECLARE @MyCursor CURSOR;
DECLARE @MyField YourFieldDataType;//replace with the data type of session_id
BEGIN
    SET @MyCursor = CURSOR FOR
    select session_id from session_events minus select session_id from sessions
    OPEN @MyCursor 
    FETCH NEXT FROM @MyCursor 
    INTO @MyField
    WHILE @@FETCH_STATUS = 0
    BEGIN
        delete session_events where session_id = @MyField
        FETCH NEXT FROM @MyCursor 
        INTO @MyField 
    END;
    CLOSE @MyCursor ;
    DEALLOCATE @MyCursor;
END;

you can also try to rewrite not in to an in:

delete from session_events where session_id in (select session_id from session_events minus select session_id from sessions)

Upvotes: 0

Luaan
Luaan

Reputation: 63732

Deleting a huge bunch of data at once means that the DB engine has to do all of that in a single transaction. This means a huge overhead when you don't actually need that (e.g. you don't need to rollback the whole operation, or you don't care about consistency - you just want to delete everything, if it fails in the middle, you'll just run the query again to delete the rest).

For your case, you could try deleting in batches. For example:

delete top 1000 from session_events where session_id Not In (SELECT distinct session_id FROM sessions)

Repeat until the table is empty.

Also, you have started from the wrong point. You might have been better off creating a foreign key between the two first, and using "on delete cascade". That would automatically delete all the session_events that no longer have a valid session. If you can start over, it might be significantly faster. No promises, though :D

Upvotes: 4

Vahid Farahmandian
Vahid Farahmandian

Reputation: 6568

  1. check if there is any index in session_event? if there is any, then disable it
  2. Use NOT EXISTS instead of NOT IN, as EXISTS has better performance than others( as @Zohar Peled write it's query)
  3. If not resolved, then run your select query separately and take a look at execution plan to see what is going to be done when you are executing Select.

Upvotes: -1

Zohar Peled
Zohar Peled

Reputation: 82474

Why not use a left join? other alternative is to use EXISTS instead of IN:

DELETE FROM Session_events
WHERE NOT EXISTS(
    SELECT 1
    FROM Session 
    WHERE Session.Session_Id = Session_events.Session_Id
)

Upvotes: 0

Related Questions