Reputation: 2581

What's the fastest way to copy data from one table to another in Django?

I have two models -

ChatCurrent - (which stores the messages for the current active chats)
ChatArchive - (which archives the messages for the chats that have ended)

The reason I'm doing this is so that the ChatCurrent table always has minimum number of entries, making querying the table fast (I don't know if this works, please let me know if I've got this wrong)

So I basically want to copy (cut) data from the ChatCurrent to the ChatArchive model. What would be the fastest way to do this. From what I've read online, it seems that I might have to execute a raw SQL query, if you would be kind enough to even state the Query I'll be grateful.

Additional details - Both the models have the same schema.

Upvotes: 4

Answers (5)

Ganesh Ghalame

Reputation: 21

def copyRecord(self,recordId):
    emailDetail=EmailDetail.objects.get(id=recordId)
    copyEmailDetail= CopyEmailDetail()
    for field in emailDetail.__dict__.keys():
       copyEmailDetail.__dict__[field] = emailDetail.__dict__[field]
    copyEmailDetail.save()
    logger.info("Record Copied %d"%copyEmailDetail.id)

Upvotes: 1

dani herrera

Reputation: 51665

My opinion is that today they are not reason to denormalize database in this way to improve performance. Indexes or partitioning + indexes should be enought.

Also, in case that, for semantic reasons, you prefer have two tables (models) like: Chat and ChatHistory (or ChatCurrent and ChatActive) as you say and manage it with django, I thing that the right way to keep consistence is to create ToArchive() method in ChatCurrent. This method will move chat entries to historical chat model. You can perform this operation in background mode, then you can thread the swap in a celery process, in this way online users avoid wait for request. Into celery process the fastest method to copy data is a raw sql. Remember that you can encapsulate sql into a stored procedure.

Edited to include reply to your comment

You can perform ChatCurrent.ToArchive() in ChatCurrent.save() method:

class ChatCurrent(model.Model):
    closed=models.BooleanField()

    def save(self, *args, **kwargs):
        super(Model, self).save(*args, **kwargs)
        if self.closed:
            self.ToArchive()

    def ToArchive(self):
        from django.db import connection, transaction
        cursor = connection.cursor()            
        cursor.execute("insert into blah blah")
        transaction.commit_unless_managed()
        #self.delete()  #if needed (perhaps deleted on raw sql)

Upvotes: 4

Marcin

Reputation: 49846

As per the above solutions, don't copy over.

If you really want to have two separate tables to query, store your chats in a single table (and for preference, use all the database techniques here mentioned), and then have a Current and Archive table, whose objects simply point to Chat objects/

Upvotes: 0

mulander

Reputation: 196

The thing you are trying to do is table partitioning. Most databases support this feature without the need for manual book keeping.

Partitioning will also yield much better results than manually moving parts of the data to a different table. By using partitioning you avoid: - Data inconsistency. Which is easy to introduce because you will move records in bulk and then remove a lot of them from the source table. It's easy to make a mistake and copy only a portion of the data. - Performance drop - moving the data around and the associated overhead from transactions will generally neglect any benefit you got from reducing the size of the ChatCurrent table.

For a really quick rundown. Table partitioning allows you to tell the database that parts of the data are stored and retrieved together, this significantly speeds up queries as the database knows that it only has to look into a specific part of the data set. Example: chat's from the current day, last hour, last month etc. You can additionally store each partition on a different drive, that way you can keep your current chatter on a fast SSD drive and your history on regular slower disks.

Please refer to your database manual to know the details about how it handles partitioning.

Example for PostgreSQL: http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:

Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.

When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.

Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.

Seldom-used data can be migrated to cheaper and slower storage media.

Upvotes: 3

Kousalik

Reputation: 3137

Try something like this:

INSERT INTO "ChatArchive" ("column1", "column2", ...)
SELECT "column1", "column2", ...
FROM "ChatCurrent" WHERE yourCondition;

and than just

DELETE FROM "ChatCurrent" WHERE yourCondition;

Upvotes: 3

What&#39;s the fastest way to copy data from one table to another in Django?

Answers (5)

Related Questions

What's the fastest way to copy data from one table to another in Django?