Tütü
Tütü

Reputation: 3

Pyannote: Load and Apply Speaker Diarization Offline

I try to use Pyannotes models offline.

I was loading and applying models like this:

from pyannote.audio import Pipeline

access_token = 'xxxxxxxxxxx'

model = Pipeline.from_pretrained(
         "pyannote/speaker-diarization-3.1",
         use_auth_token=access_token)

path_in = 'blabla/1-137-A-32.wav'

num_speakers = 1

model(path_in,
   num_speakers=num_speakers).labels()

That works fine.

But now I followed the instructions for offline use: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb

My directory structure is as follows:

src-
     |-pyannote_offline_config.yaml
     |-pyannote_pytorch_model.bin

---- YAML ----

version: 3.1.0

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: pyannote/wespeaker-voxceleb-resnet34-LM
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: src/pyannote_pytorch_model.bin
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 12
    threshold: 0.7045654963945799
  segmentation:
    min_duration_off: 0.0

---- Loading Model ----

path_yaml = 'src/pyannote_offline_config.yaml'

model = Pipeline.from_pretrained(path_yaml)

path_in = 'blabla/1-137-A-32.wav'

num_speakers = 1

model(path_in,
         num_speakers=num_speakers).labels()

But that results in: "A pipeline must be instantiated with pipeline.instantiate(parameters) before it can be applied."

OK, next try:

---- Loading Model ----

path_yaml = 'src/pyannote_offline_config.yaml'

model = Pipeline.from_pretrained(path_yaml)

params = {'clustering':
    {'method': 'centroid',
    'min_cluster_size': 12,
    'threshold': 0.7045654963945799},
  'segmentation':
    {'min_duration_off': 0.0}}


pipeline = model.instantiate(params)

path_in = 'blabla/1-137-A-32.wav'

num_speakers = 1

pipeline(path_in,
         num_speakers=num_speakers).labels()

But that results in: "A pipeline must be instantiated with pipeline.instantiate(parameters) before it can be applied."

I don't understand the problem.

It works if I do it like that: ---- Loading Model ----

path_yaml = 'src/pyannote_offline_config.yaml'

model = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", path_yaml)

path_in = 'blabla/1-137-A-32.wav'

num_speakers = 1

model(path_in,
         num_speakers=num_speakers).labels()

But after an upload to gitlab the test pipline gives me:"Could not download 'pyannote/speaker-diarization-3.1' pipeline. It might be because the pipeline is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with: Pipeline.from_pretrained('pyannote/speaker-diarization-3.1', ... use_auth_token=YOUR_AUTH_TOKEN)"

So it seems that something is on my local computer that is not downloaded with the pip install. E.g. if I load it without the yaml and only with model = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1"), it also works.

Upvotes: 0

Views: 1869

Answers (2)

R4L
R4L

Reputation: 1

Looks like pyannote cannot download model.bin anymore : (

But I found that it only needs authentication to download once, after which the script can load the models from ~/.cache.

Upvotes: 0

Tütü
Tütü

Reputation: 3

It works now. I went through the instructions again from the beginning and downloaded everything again. The model file on my computer was somehow corrupt.

Upvotes: 0

Related Questions