Albert
Albert

Reputation: 68170

Convert FFT to PCM

I have some FFT data, 257 dimensions, every 10 ms, with 121 frames, i.e. 1.21 secs. I think the first dimension is probably something else and the remaining are the FFT coefficients, I guess. It's probably just spectogram data. From a comment about the FFT data, sqrt10 and mean-variance-normalization might have been applied on it.

From there, I want to calculate back some PCM signal for 44.1 Hz so I can play the sound. I asked the same question in a more mathematical way here but maybe StackOverflow is a better place because I actually want to implement this. I also asked the same question about the theory here on DSP SE.

How would I do that? Maybe I need some more information (which I have to find out somehow) - which? Maybe these missing information can be intelligently guessed somehow?

This question is both about the theory and practical implementation. The implementation is trivial I guess. But a concrete example in some language would be nice to help understanding the theory. Maybe C++ with FFTW? I skipped through the FFTW docs but I fail to understand all the terminology and some background, e.g. here. Why is it from complex to real or the other way, I only want real to real. What are those REDFT? What's a DCT, DFT, DST? FFTW_HC2R?

I read all the FFT data, i.e. 121 * 257 floats, into a vector freq_bins.

std::vector<float32_t> freq_bins; // FFT data
int freq_bins_count = 257;
size_t len = 121;

std::vector<float32_t> pcm; // output, PCM data

int N = freq_bins_count;
std::vector<double> out(N), orig_in(N);

// inspiration: https://stackoverflow.com/questions/2459295/invertible-stft-and-istft-in-python/6891772#6891772
for(int f = 0; f < len; ++f) {
    size_t pos = freq_bins_count * f;
    for(int i = 0; i < N; ++i)
        out[i] = pow(freq_bins[pos + i] + offset, 10);  // fft was sqrt10 + mvn
    fftw_plan q = fftw_plan_r2r_1d(N, &out[0], &orig_in[0], FFTW_REDFT00, FFTW_ESTIMATE);
    fftw_execute(q);
    fftw_destroy_plan(q);

    // naive overlap-and-add
    auto start_frame = size_t(f * dt * sampleRate);
    for(int i = 0; i < N; ++i) {
        sample_t frame = orig_in[i] * scale / (2 * (N - 1));
        size_t idx = start_frame + i;
        while(idx >= pcm.size())
            pcm.push_back(0);
        pcm[idx] += frame;
    }
}

But this is wrong, I guess. I just get garbage out.

Related might be this question. Or this.

Upvotes: 1

Views: 841

Answers (2)

KillaKem
KillaKem

Reputation: 1025

If the data you are have is real then the data you have is most probably spectrogram data and if the data you are receiving is complex then you most probably have raw short time fourier transform (STFT) data (See the diagram on this post to see how STFT/spectrogram data is produced). Spectrogram data is produced by taking the magnitude squared of STFT data and is thus not invertible because all the phase information in the audio signal has been lost but raw STFT data is invertible so if that is what you have then you might want to look for a library that performs the inverse STFT function and try using that.

As for the question of what the FFT dimensions in your data represent, I reckon the 257 data points you are receiving every 10ms are the result of a 512 point FFT being used in the STFT process.The first sample is the 0Hz frequency and the rest of the 256 data points are one half of the FFT spectrum (the other half of the FFT data has been discarded because the input to the FFT is real and so one half of the FFT data is simply the complex conjugate of the other half).

In addition to this, I would like to point out that just because you are receiving FFT data every 10ms 121 times does not mean the audio signal is 1.21s.The STFT is usually produced by using overlapping windows so your audio signal is might be shorter than 1.21s.

Upvotes: 2

datenwolf
datenwolf

Reputation: 162194

You'd simply push that data you have through the inverse fourier transform. All FFT libraries offer forward and backward transformation functions.

Upvotes: 0

Related Questions