Reputation: 3
I try to use Pyannotes models offline.
I was loading and applying models like this:
from pyannote.audio import Pipeline
access_token = 'xxxxxxxxxxx'
model = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=access_token)
path_in = 'blabla/1-137-A-32.wav'
num_speakers = 1
model(path_in,
num_speakers=num_speakers).labels()
That works fine.
But now I followed the instructions for offline use: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb
My directory structure is as follows:
src-
|-pyannote_offline_config.yaml
|-pyannote_pytorch_model.bin
---- YAML ----
version: 3.1.0
pipeline:
name: pyannote.audio.pipelines.SpeakerDiarization
params:
clustering: AgglomerativeClustering
embedding: pyannote/wespeaker-voxceleb-resnet34-LM
embedding_batch_size: 32
embedding_exclude_overlap: true
segmentation: src/pyannote_pytorch_model.bin
segmentation_batch_size: 32
params:
clustering:
method: centroid
min_cluster_size: 12
threshold: 0.7045654963945799
segmentation:
min_duration_off: 0.0
---- Loading Model ----
path_yaml = 'src/pyannote_offline_config.yaml'
model = Pipeline.from_pretrained(path_yaml)
path_in = 'blabla/1-137-A-32.wav'
num_speakers = 1
model(path_in,
num_speakers=num_speakers).labels()
But that results in: "A pipeline must be instantiated with pipeline.instantiate(parameters)
before it can be applied."
OK, next try:
---- Loading Model ----
path_yaml = 'src/pyannote_offline_config.yaml'
model = Pipeline.from_pretrained(path_yaml)
params = {'clustering':
{'method': 'centroid',
'min_cluster_size': 12,
'threshold': 0.7045654963945799},
'segmentation':
{'min_duration_off': 0.0}}
pipeline = model.instantiate(params)
path_in = 'blabla/1-137-A-32.wav'
num_speakers = 1
pipeline(path_in,
num_speakers=num_speakers).labels()
But that results in: "A pipeline must be instantiated with pipeline.instantiate(parameters)
before it can be applied."
I don't understand the problem.
It works if I do it like that: ---- Loading Model ----
path_yaml = 'src/pyannote_offline_config.yaml'
model = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", path_yaml)
path_in = 'blabla/1-137-A-32.wav'
num_speakers = 1
model(path_in,
num_speakers=num_speakers).labels()
But after an upload to gitlab the test pipline gives me:"Could not download 'pyannote/speaker-diarization-3.1' pipeline. It might be because the pipeline is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with: Pipeline.from_pretrained('pyannote/speaker-diarization-3.1', ... use_auth_token=YOUR_AUTH_TOKEN)"
So it seems that something is on my local computer that is not downloaded with the pip install. E.g. if I load it without the yaml and only with model = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
, it also works.
Upvotes: 0
Views: 1869
Reputation: 1
Looks like pyannote cannot download model.bin
anymore : (
But I found that it only needs authentication to download once, after which the script can load the models from ~/.cache
.
Upvotes: 0
Reputation: 3
It works now. I went through the instructions again from the beginning and downloaded everything again. The model file on my computer was somehow corrupt.
Upvotes: 0