Reputation: 155
I am using the following code to initiate Webworker which creates embeddings using Universal Sentence Encoder
const initEmbeddingWorker = (filePath) => {
let worker = new Worker(filePath);
worker.postMessage({init: 'init'})
worker.onmessage = (e) => {
worker.terminate();
}
}
Webworker code
onmessage = function (e) {
if(e.data.init && e.data.init === 'init') {
fetchData();
}
}
const fetchData = () => {
//fetches data from indexeddb
createEmbedding(data, storeEmbedding);
}
const createEmbedding = (data, callback) => {
use.load().then(model => {
model.embed(data).then(embeddings => {
callback(embeddings);
})
});
}
const storeEmbedding = (matrix) => {
let data = matrix.arraySync();
//store data in indexeddb
}
It takes 3 minutes to create 100 embeddings using 10 Webworkers running simultaneously and each worker creating embeddings for 10 sentences. The time taken to create embeddings is too large as I need to create embedding for more than 1000 sentences which takes around 25 to 30 minutes. Whenever this code runs it hogs all the resources which makes the machine very slow and almost unusable.
Are there any performance optimizations that are missing?
Upvotes: 5
Views: 498
Reputation: 18401
Using 10 webworkers means that the machine used to run it has at least 11 cores. Why this assumption ? (number of webworker + main thread )
To leverage the use of webworker to the best, each webworker should be run on a different core. What happens when there are more workers than cores ? Well the program won't be as fast as expected because a lot of times will be used exchanging communications between the cores.
Now let's look at what happens on each core.
arraySync
is a blocking call preventing that thread from be using for another thing.
Instead of using arraySync
, array
can be used.
const storeEmbedding = async (matrix) => {
let data = await matrix.array();
//store data in indexeddb
}
array
and its counterpart arraySync
are slower compare to data
and dataSync
. It will be better to store the flatten data, output of data
.
const storeEmbedding = async (matrix) => {
let data = await matrix.data();
//store data in indexeddb
}
Upvotes: 0