Reputation: 3426
I am currently working on a music visualization LED strip display (similar to this) but got stuck at the process of extracting audio frequency / magnitude from the sample data.
The entire process takes up quite a few steps, and I am not sure at which point I am failing exactly, so please bear with me. I start with reading audio from a TargetDataLine
configured with AudioFormat.
AudioSystem.getTargetDataLine(new AudioFormat(8000.0f, 16, 1, true, true));
Next, I read the data as described in this tutorial. Only difference is that I am using a smaller buffer, effectively reading 64 bytes at a time.
Here is an example data read in this way from a 500Hz test tone.
final byte[] buffer = { // 32 16-bit samples
30, 111, 43, -19, 50, -74, 49, -54,
41, 70, 26, 121, 7, -90, -13, -89,
-31, -118, -44, 17, -51, 75, -50, 60,
-42, -62, -27, -115, -8, 91, 12, 85,
30, 106, 43, -34, 50, -93, 49, -76,
41, 52, 26, 108, 7, -93, -13, -84,
-31, -104, -44, 38, -51, 98, -50, 84,
-42, -42, -27, -98, -8, 99, 12, 83
Next, I transform the sampled bytes into an array of double
values in range [-1, 1]. I took inspiration from the aforementioned tutorial and this code.
final double[] samples = new double[buffer.length];
int sampleIndex = 0;
int byteIndex = 0;
while (byteIndex < buffer.length) {
int low = buffer[byteIndex];
int high = buffer[byteIndex];
int sampleIntValue = ((high << 8) + (low & 0x00ff));
double maxSampleValue = 32768;
samples[2 * sampleIndex] = ((double) sampleIntValue) / maxSampleValue; // Transforming to [-1, 1]
samples[(2 * sampleIndex) + 1] = 0; // Imaginary part
Next, I perform a Fast Fourier Transform on the data using the edu.emory.mathcs:JTransforms:2.4 library.
final DoubleFFT_1D fft = new DoubleFFT_1D(samples.length / 2);
Then, I calculate frequencies and magnitudes as described in this code.
final float sampleRate = 8000f;
final Map<Double, Double> frequenciesToMagnitudes = IntStream.range(0, samples.length / 2)
index -> 2 * ((double) index / (double) samples.length) * sampleRate,
index -> Math.log10(
Math.pow(samples[2 * index], 2) // real part
+ Math.pow(samples[(2 * index) + 1], 2) // imaginary part
I find the maximum magnitude so I can scale the displayed values accordingly.
final double maximumMagnitude = frequenciesToMagnitudes.values().stream().max(Double::compare).orElse(0d);
And finally, I display the results (taller block represents a brighter LED).
final char[] magnitudeDisplay = {'▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'};
frequenciesToMagnitudes.entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(frequencyToMagnitude -> {
final double magnitudePercentage = frequencyToMagnitude.getValue() / maximumMagnitude;
final int characterToDisplayIndex = Math.min(magnitudeDisplay.length - 1, Math.max(0, (int) (magnitudePercentage * magnitudeDisplay.length)));
final char characterToDisplay = magnitudeDisplay[characterToDisplayIndex];
From all this, I would expect to see two distinct spikes (my frequency + alias region), but instead there is 8.
The number of spikes changes depending on frequency of audio playback (sometimes it's two, sometimes it's four), and the lower the frequency the more spikes I see.
My question is: how do I extract frequency / magnitude pairs from audio data?
Upvotes: 0
Views: 433
Reputation: 3426
After a bit of digging I have found the OpenIMAJ library which does exactly what I need. Here are the differences in my code and their code.
int high = buffer[byteIndex];
int low = buffer[byteIndex];
final int fftSize = (int) Math.pow(2, 32 - Integer.numberOfLeadingZeros(numberOfSamples - 1));
samples[2 * sampleIndex] = (float) sampleIntValue * Integer.MAX_VALUE / Short.MAX_VALUE;
for (int i = 0; i < samples.length; i += 2) samples[i] /= fftSize;
final Map<Float, Float> frequenciesToMagnitudes = IntStream.range(0, samples.length / 4)
index -> (((float) (index * 2)) * (sampleRate / (float) numberOfSamples)) / 2,
index -> {
final float realPart = samples[index * 2];
final float imaginaryPart = samples[index * 2 + 1];
return (float) Math.sqrt(realPart * realPart + imaginaryPart * imaginaryPart);
After all the changes, the output appears to be much more reasonable:
I am still not entirely sure if the frequency calculation is correct, though.
Upvotes: 1