Reputation: 11
I am making an audio filtering application at work that reads over hundreds of audio files and filters them. So, if the audio has human voice in it, it will accept it and if it does not- it will delete the audio file.
I am using ffmpeg to get the details of the audio and add other filters like size and duration and silence (though it is not very accurate in detecting silence for all audio files.)
My company asked me to try the Google Cloud Speech API to detect if the audio has any human voice in it.
So with this code, some audio files return a Transcript of spoken words in the audio file, but what I need is to determine if a human is speaking or not.
I have considered using hark.js for it but there does not seem to be enough documentation and I am short on time!
Ps. I am an intern and I'm just starting out with programming. I apologize if my question does not make sense or sounds dumb.
# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';
# Imports the Google Cloud client library
use Google\Cloud\Speech\V1\SpeechClient;
use Google\Cloud\Speech\V1\RecognitionAudio;
use Google\Cloud\Speech\V1\RecognitionConfig;
use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;
putenv('GOOGLE_APPLICATION_CREDENTIALS=../../credentials.json');
echo getcwd() . "<br>";
chdir('test-sounds');
echo getcwd() . "<br>";
echo shell_exec('ls -lr');
$fileList = glob('*');
foreach($fileList as $filename){
//echo $filename, '<br>';
# The name of the audio file to transcribe
$audioFile = __DIR__ . '/' . $filename;
# get contents of a file into a string
$content = file_get_contents($audioFile);
# set string as audio content
$audio = (new RecognitionAudio())
->setContent($content);
# The audio file's encoding, sample rate and language
$config = new RecognitionConfig([
'encoding' => AudioEncoding::LINEAR16,
'language_code' => 'ja-JP'
]);
# Instantiates a client
$client = new SpeechClient();
# Detects speech in the audio file
$response = $client->recognize($config, $audio);
# Print most likely transcription
foreach ($response->getResults() as $result) {
$alternatives = $result->getAlternatives();
$mostLikely = $alternatives[0];
$transcript = $mostLikely->getTranscript();
printf('<br>Transcript: %s' . PHP_EOL, $transcript . '<br>');
}
$client->close();
}
?> ```
Upvotes: 1
Views: 1037
Reputation: 11
So, I was able to solve the problem for myself. What I had to do was to declare transcript as null in order to get the solution that I needed. Previously, it did not do anything if the audio returned nothing so the delete part was skipped. After initializing $transcript variable as null, the condition for the delete was met.
The system itself is not perfect. The idea is that if Google Speech API has returns any transcript, the system decides that it accepts the audio file. If not, the audio is deleted from my system. There are several types of audio that are not accepted. Whatever the case, it met the requirements that were set for me so I suppose that is fine for me. I don't know if it will be helpful for anyone else.
Ps. The below code looks slightly different from the one in my question because it is from my program
try {
# Detects speech in the audio file
$response = $client->recognize($config, $audio);
# Print most likely transcription
//The below line is what did the trick
$transcript = null;
foreach ($response->getResults() as $result) {
$alternatives = $result->getAlternatives();
$mostLikely = $alternatives[0];
$transcript = $mostLikely->getTranscript();
//printf('<br>Transcript: %s' . PHP_EOL, $transcript . '<br>');
echo "<td>" . $rowcount . "</td>";
echo "<td>" . $filename3 . "</td>";
echo "<td>" . $transcript ."</td>";
echo "<td>" . "<audio controls> <source src='" .$filename3. "' type='audio/wav'> </audio>" . "</td>";
}
if ($transcript == null) {
// echo '<br>'.$filename3.' blah <br>';
rename($filename3, '../Trash/delete/'.$filename3);
}
} catch (Exception $e) {
// Do something
} finally {
$client->close();
}
Upvotes: 0