Reputation: 75
To begin with, it so happened that I am a guitarist with little programming knowledge (to be honest, I'm more interested in this than I can do anything). And so the other day I was studying the work of neural networks, which led me to research on the use of video cards for mathematical calculations. There is always some delay in programs like Guitar Rig and others. In my understanding, transforming sound using such programs is simply mathematical operations performed with sound (yes, it may be said too simply and inaccurately, but the essence, I think, is correct). And I wondered if there was no way to implement something like this, where gpu would be used for computation and not cpu? Considering that, for example, when rendering in video games, the latency on video cards is minimal, despite the fact that this is a more resource-intensive operation than audio processing.
I know there are various libraries for harnessing the power of graphics cards, but not sure if this is what I need. I found only two materials related to this in one way or another, but as I said earlier, I'm not sure if they work completely well. If you know good libraries, please share this. https://archive.codeplex.com/?p=cudafy
https://www.codeproject.com/Articles/1116907/How-to-Use-Your-GPU-in-NET
Also, I thought I would most likely try to implement this in C #.
Upvotes: 6
Views: 6125
Reputation: 5203
The question is rather vague as to the use scenarios it envisages, but to give a slightly more elaborate answer than "just can't do it, mate" from the accepted answer, there's always a trade-off between latency and bandwidth. Whether one matter or the other depends on the application. If you want real-time processing during a live performance is one end of it. If you want DAW-like post-processing for a recording, that's at the other end.
Here's an illustrative table from a recent paper (Renney, Gaster & Mitchell "There and Back Again: The Practicality of GPU Accelerated Digital Audio ", NIME 2020), showing this tradeoff when using GPUs for audio synthesis.
Physical model synthesizer bidirectional real-time test.
You get squeezed by latency and jitter from one end (at high buffer sizes) and by lack of bandwidth from the other end (at low buffer sizes).
Using pinned memory for transfers makes a massive latency difference with some discrete GPUs, esp. from the AMD family, as shown in the graph below:
Pinned memory makes almost no difference at all with on-die GPUs, and they already are also kings in the latency department, but generally a lower performance when it comes to internal bandwidth and processing power.
There's also the issue of API-induced overhead, that basically counts as latency for application purposes. As the paper found OpenCL is somewhat worse than CUDA in that regard (on Nvidia cards).
As for your actual question of what's out there... not that much. TLDR: among the well-know platforms there's experimental CUDA support in Csound 6, only distributed as source. (And that's not C#, if I need to say it.)
ROLI was envisaging SOUL as the would-be CUDA-like API (and actually language) for running audio tasks on DSPs much like you have accelerated graphics on the GPU. Alas ROLI went bankrupt recently and SOUL is in a deep freeze right now. So that may be a hint that there's probably not that much of market for this kind of PC-based hardware audio acceleration, so the lack of good tooling and libraries goes hand in hand with that.
There's the older Faust for similar synthesis tasks (as SOUL was aiming for), and there are some DSPs that support Faust like SHARC. But that's mostly a different world than GPUs. Faust doesn't do GPUs and its developers aren't too interested in adding support. They do have more recent work on Faust for FPGAs. Generally, they seem to prefer targeting stand-alone embedded platforms on which you can make hands-on instruments, rather than PC-connected stuff. Here's a fairly recent (2020) overview of what they do support in that regard. GPUs are in the very last section of that paper with this:
Graphics Processing Units (GPUs) have been increasingly used in recent years for real-time audio processing applications, taking advantage of their high degree of parallelization to run DSP algorithms that can be easily divided into multiple processing units such as modal reverbs [19], etc.
We believe that FAUST has a role to play in that context by facilitating the programming of this type of platform. We did some experiment in 2010 by developing OpenCL and CUDA backends. At that time, results were not really convincing. Now that GPUs are becoming much more powerful, and with a better understanding of the class of DSP algorithms that can take benefit of their massive data parallelism capabilities, we plan to work again on this subject in the future.
The 2019 LAC paper they cite in that context is by two Stanford CCRMA researchers. Worth looking at for an actual application (digital waveguide), as well as a mini-survey, which itself is about a page long so I won't be quoting all of it here, but generally it's been useful for specialized applications like physical models of instruments or massive speaker arrays:
Trebien et. al.[7] use modal synthesis to produce realistic sounds for realtime collisions between objects of different materials. Not ing that IIR filters do not traditionally perform well on GPUs, due to dependence on prior state not mapping well to the parallel nature of GPUs, they introduce a transform to change this into a linear convolution operator and to unlock time-axis parallelism. [...]
Belloch covers GPU-accelerated massively parallel filtering in [9], and Belloch et. al.[10] leverage GPU acceleration to implement Wave Field synthesis on a 96-speaker array, with nearly ten thousand fractional-delay room filters with thousands of taps. The maximum number of simulated sound sources is computed for different realtime buffer sizes and space partitions; with a 256-sample buffer (5.8ms at 44.1kHz), between 18 and 198 real-time sources could be placed in the field.
Bilbao and Webb[11] present a GPU-accelerated model of timpani, synthesizing sound at 44.1kHz in a 3D computational space within and outside the drum. The GPU approach uses a matrix-free implementation to obtain a 30x+ speedup over a MATLAB CPU-, sparse-matrix-based prototype, and a greater-than-7.5x speedup over single-threaded C code baseline. The largest (and most computationally-expensive) drum update equation is optimized to 2.04 milliseconds per sample, where the bottleneck is a linear system update for the drum membrane.
For more traditional synthesis techniques, they cite a pretty old paper that synthesized millions of sinusoids on the GPU in realtime, but they note that insofar people haven't found much of an application for that many grains.
As far as I can tell all these research apps have been written directly for GPU compute APIs like CUDA, not in any intermediate libraries for audio alas. So it's mostly a "do it from scratch" approach. The Stanford paper makes if fairly clear why that happened: you often need to resort e.g. to the CUDA Analysis tools to profile memory accesses.
Memory accesses should go to the fastest RAM possible, and we need to pay attention to memory alignment. While CPU code does benefit from similar optimizations, GPU algorithms rapidly fall in performance when parameters stray from the ideal range.
One other note around generalizing this code for end users: during development we consulted the capabilities of our particular graphics card several times, to see how many registers we have or to see the various ways we can slice shared memory. While these parameters may be queried from the card at runtime and for the most part newer and more powerful GPUs contain a superset of old resources, this is not a guarantee, and for example if we tried to run our compiled waveguide binary on a GTX 480 from several generations back, it would fail to run because we request too much shared memory.
Upvotes: 1
Reputation: 904
If you are looking for low-latency audio processing, maybe take a look at this: https://github.com/exporl/apex
It is a system to do hearing measurements in people, developed by the experimental ortho rhino laryngology group in the University of Leuven (Belgium). It contains the Bertha library, which is made specifically for low latency audio processing.
(And no, I don't think it uses the GPU)
Upvotes: 0
Reputation: 33
I totally agree with the other answers in that GPUs will (probably) have too much much latency for online audio processing. They are designed to work on relatively large chunks of data.
For custom processing of streamed data (like your audio signals), you may want to look into dedicated hardware like DSPs or an FPGA if you really want all the customization. But then you're looking into working on hardware, rather than software - that's a rather different beast.
That being said, I've had a little experience with Alchitry's FPGA development boards and it's not quite as horrible as some people claim.
Upvotes: 2
Reputation: 51224
The difference is that video games actually avoid transmitting large amounts of data to the GPU. CPU to GPU transfer (and backwards) is an expensive operation. Once the textures and shaders (GPU "code") are transmitted to the GPU, after that the GPU only needs to get a bunch of vertices to render each individual frame, and afterwards the GPU simply directly sends the results to your display.
So the GPU wins in situations where the amount of work to be done can be parallelized, and the work is large relative to the amount of data that needs to be transmitted to and from the GPU. And to reduce audio latency, you must process relatively small bits of data, as soon as possible.
For audio processing you would need to send the chunk of data received from the audio interface to the GPU, wait for the GPU to finish processing, then send the processed data back to the CPU, where it would be sent back to the audio interface.
Upvotes: 2
Reputation: 179877
Sorry, going to disappoint you straight away. I have tried using NVidia CUDA (the native library) for audio processing, using neural networks. It's what my company does for a living, so we're pretty competent. We found that the typical NVidia card has too much latency. They're fast, that's not the problem, but that means they can do many million operations in a millisecond. However, the DMA engine feeding data to the card typically has latencies that are many milliseconds. Not so bad for video, bad for audio - video often is 60 Hz whereas audio can be 48000 Hz.
Upvotes: 4