Saad
Saad

Reputation: 159

Parsing LIUM Speaker Diarization Output

How can I know which speaker spoke for how much time by using LIUM Speaker Diarization toolkit?

For example, this is my .seg file.

;; cluster S0 [ score:FS = -33.93166562542459 ] [ score:FT = 
-34.24966646974656 ] [ score:MS = -34.05223781565528 ] [ score:MT = 
-34.32834794609819 ] 
Seq06 1 0 237 F S U S0
Seq06 1 2960 278 F S U S0
;; cluster S1 [ score:FS = -33.33289449700619 ] [ score:FT = 
-33.64489165914674 ] [ score:MS = -32.71833169822944 ] [ score:MT = 
-33.380835069917275 ] 
Seq06 1 238 594 M S U S1
Seq06 1 1327 415 M S U S1
Seq06 1 2311 649 M S U S1
;; cluster S2 [ score:FS = -33.354874450638064 ] [ score:FT = 
-33.46618707052516 ] [ score:MS = -32.70702429201772 ] [ score:MT = 
-33.042146088874844 ] 
Seq06 1 832 495 M S U S2
Seq06 1 1742 569 M S U S2

How can I extract the times from this file?

Upvotes: 3

Views: 2057

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

In this line

Seq06 1 2960 278 F S U S0

You have

field 1: Seq06 = the show name
field 2: 1 = the channel number
field 3: 2960 = the start of the segment (in features)
field 4: 278 = the length of the segment (in features)
field 5: F = the speaker gender (U=unknown, F=female, M=Male)
field 6: S = the type of band (T=telephone, S=studio)
field 7: U = the type of environment (music, speech only, …)
field 8: S0 = the speaker label

Times are in features, so 2960 is 29.60 seconds (divide by 100 to convert from features seconds). Length is also in features, so your segment length is 2.78 seconds.

Documented in LIUM WIKI

Upvotes: 5

Related Questions