Reputation: 7071
Is possible to compare a voice with already recorded voice in the phone.Based on the comparison we can rate like Good, Very Good , Excellent etc. Most closed sound get high rating.
Anybody know is it possible in Android?
Help is highly appreciable.
Upvotes: 18
Views: 8063
Reputation: 1420
Using 'Musicg' library you can compare two voice (.wav format) files. use Wave object to load the wave file to instantiate in pgm. here using FingerPrintSimilarity function you pass pre recorded wav files to get the output. But you should know that "musicg" library deals only with .wav format files, so if you have a an .mp3 file for example you need to convert it to a wave file first. android gradle dependency: implementation group: 'com.github.fracpete', name: 'musicg', version: '1.4.2.2'
for more:
https://github.com/loisaidasam/musicg
sample code:
private void compareTempFile(String str) {
Wave w1 = new Wave(Environment.getExternalStorageDirectory().getAbsolutePath()+"/sample1.wav");
Wave w2 = new Wave(Environment.getExternalStorageDirectory().getAbsolutePath()+"/sample2.wav");
println("Wave 1 = "+w1.getWaveHeader());
println("Wave 2 = "+w2.getWaveHeader());
FingerprintSimilarity fpsc1 = w2.getFingerprintSimilarity(w1);
float scorec = fpsc1.getScore();
float simc= fpsc1.getSimilarity();
tvSim.setText(" Similarity = "+simc+"\nScore = "+scorec);
println("Score = "+scorec);
println("Similarity = "+simc);
}
Upvotes: 0
Reputation: 57619
For a general audio processing library I can recommend marsyas. Unfortunately the official home page is currently down.
Marsyas even provides a sample android application. After getting a proper signal analysis framework, you need to analyse your signal. For example, the AimC implementation for marsyas can be used to compare voice.
I recommend installing marsyas on your computer and fiddle with the python example scripts.
For your voice analysis, you could use a network like this:
vqNetwork = ["Series/vqlizer", [
"AimPZFC/aimpzfc",
"AimHCL/aimhcl",
"AimLocalMax/aimlocalmax",
"AimSAI/aimsai",
"AimBoxes/aimBoxes",
"AimVQ/vq",
"Gain/g",
]
This network takes your audio data and transforms it as it would be processed by a human ear. After that it uses vector quantization to reduce the many possible vectors to very specific codebooks with 200 entries. You can then translate the output of the network to readable characters (utf8 for example), which you then can compare using something like string edit distances (e.g. Levenshtein distance).
Another possibility is to use MFCC (Mel Frequency Cepstral Coefficients) for speech recognition which marsyas supports as well and use something, for example Dynamic Time Warping, to compare the outputs. This document describes the process pretty well.
Upvotes: 3