combine multiple spacy textcat_multilabel models into a single textcat_multilabel model

Question

Problem: I have millions of records that need to be transformed using a bunch of spacy textcat_multilabel models.

// sudo code 

for model in models:
    nlp = spacy.load(model)
    for groups_of_records in records: // millions of records
        new_data = nlp.pipe(groups_of_records) // data is getting processed bulk
        // process data 
        bulk_create_records(new_data)

My current loop is as follows:

load a model
loop through records / transform data using model / save

As you can imagine, the more records i process, and the more models i include, the longer this entire process will take. The idea is to make a single model, and just process my data once, instead of (n * num_of_models)

Question: is there a way to combine multiple textcat_multilabel models created from the same spacy config, into a single textcat_multilabel model?

combine multiple spacy textcat_multilabel models into a single textcat_multilabel model

Answers (1)

Related Questions