Reputation: 344
I'm using Django 2.2 with a PostgreSQL database.
I have two models: Gene
and Annotation
and need to create and link (many-to-many) thousands of Genes and Annotations at the same time.
class Gene(models.Model):
identifier = models.CharField(max_length=50, primary_key=True)
annotation = models.ManyToManyField(Annotation)
class Annotation(models.Model):
name = models.CharField(max_length=120, unique=True, primary_key=True)
I already found a way to create the objects very efficiently:
Gene.objects.bulk_create([Gene(identifier=identifier) for identifier in gene_id_set])
This is my Django-docs-inspired way to create relationships:
relationships = {
'gene1': ['anno1', 'anno2'],
'gene2': ['anno3'],
...
}
for gene in relationships:
gene = Annotation.objects.get(pk='gene1')
gene.annotation_set.set([Annotation.objects.get(pk=anno) for anno in relationships[gene])
But this is very clumsy: It hits the database 4 times! Is there not a better way, using Django-built-in-tools or raw SQL queries?
The many-to-many table (myapp_gene_annotation
) looks like this:
id gene_id annotation_id
1 gene1 anno1
2 gene1 anno2
3 gene2 anno3
...
Upvotes: 2
Views: 85
Reputation: 476614
Now we can create Gene_annotation
objects: the implicit model Django has constructed for the ManyToMany
table, like:
through_model = Gene.annotation.through
objs = [
through_model(gene_id=gene_id, annotation_id=anno_id)
for gene_id, rels in relationships.items()
for anno_id in rels
]
Now we can perform a bulk insert in the table of the through_model
:
through_model.objects.bulk_create(objs)
You should of course only add the relations after you have added the Gene
s and Annotation
s, since otherwise, the foreign key constraints at the database side will raise an error.
We will here insert all the relations in one time. If the table is huge, this might result in multiple queries, but still it is more efficient than querying once per relation.
Upvotes: 1