Reputation: 58049
I'm using TinySoundFont to use SF2 files on watchOS. I want to play the raw audio generated by the framework in real time (which means calling tsf_note_on
as soon as the corresponding button is pressed and calling tsf_render_short
as soon as new data is needed). I'm using an AVAudioSourceNode to achieve that.
Despite the sound rendering fine when I render it into a file, it's really noisy when played using the AVAudioSourceNode. (Based on the answer from Rob Napier, this might be because I ignore the timestamp property - I'm looking for a solution that addresses that concern.) What causes this issue and how can I fix it?
I'm looking for a solution that renders audio realtime and not precalculates it, since I want to handle looping sounds correctly as well.
You can download a sample GitHub project here.
import SwiftUI
import AVFoundation
struct ContentView: View {
@ObservedObject var settings = Settings.shared
init() {
settings.prepare()
}
var body: some View {
Button("Play Sound") {
Settings.shared.playSound()
if !settings.engine.isRunning {
do {
try settings.engine.start()
} catch {
print(error)
}
}
}
}
}
import SwiftUI
import AVFoundation
class Settings: ObservableObject {
static let shared = Settings()
var engine: AVAudioEngine!
var sourceNode: AVAudioSourceNode!
var tinySoundFont: OpaquePointer!
func prepare() {
let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
tinySoundFont = tsf_load_filename(soundFontPath)
tsf_set_output(tinySoundFont, TSF_MONO, 44100, 0)
setUpSound()
}
func setUpSound() {
if let engine = engine,
let sourceNode = sourceNode {
engine.detach(sourceNode)
}
engine = .init()
let mixerNode = engine.mainMixerNode
let audioFormat = AVAudioFormat(
commonFormat: .pcmFormatInt16,
sampleRate: 44100,
channels: 1,
interleaved: false
)
guard let audioFormat = audioFormat else {
return
}
sourceNode = AVAudioSourceNode(format: audioFormat) { silence, timeStamp, frameCount, audioBufferList in
guard let data = self.getSound(length: Int(frameCount)) else {
return 1
}
let ablPointer = UnsafeMutableAudioBufferListPointer(audioBufferList)
data.withUnsafeBytes { (intPointer: UnsafePointer<Int16>) in
for index in 0 ..< Int(frameCount) {
let value = intPointer[index]
// Set the same value on all channels (due to the inputFormat, there's only one channel though).
for buffer in ablPointer {
let buf: UnsafeMutableBufferPointer<Int16> = UnsafeMutableBufferPointer(buffer)
buf[index] = value
}
}
}
return noErr
}
engine.attach(sourceNode)
engine.connect(sourceNode, to: mixerNode, format: audioFormat)
do {
try AVAudioSession.sharedInstance().setCategory(.playback)
} catch {
print(error)
}
}
func playSound() {
tsf_note_on(tinySoundFont, 0, 60, 1)
}
func getSound(length: Int) -> Data? {
let array = [Int16]()
var storage = UnsafeMutablePointer<Int16>.allocate(capacity: length)
storage.initialize(from: array, count: length)
tsf_render_short(tinySoundFont, storage, Int32(length), 0)
let data = Data(bytes: storage, count: length)
storage.deallocate()
return data
}
}
Upvotes: 2
Views: 712
Reputation: 299355
The AVAudioSourceNode
initializer takes a render block. In the mode you're using (live playback), this is a real-time callback, so you have a very tight deadline to fill the block with the requested data and return it so it can be played. You don't have a ton of time to do calculations. You definitely don't have time to access the filesystem.
In your block, you're re-computing an entire WAV every render cycle, then writing it to disk, then reading it from disk, then filling in the block that was requested. You ignore the timestamp requested, and always fill the buffer starting at sample zero. The mismatch is what's causing the buzzing. The fact that you're so slow about it is probably what's causing the pitch-drop.
Depending on the size of your files, the simplest way to implement this is to first decode everything into memory, and fill in the buffers for the timestamps and lengths requested. It looks like your C code already generates PCM data, so there's no need to convert it into a WAV file. It seems to already be in the right format.
Apple provides a good sample project for a Signal Generator that you should use as a starting point. Download that and make sure it works as expected. Then work to swap in your SF2 code. You may also find the video on this helpful: What’s New in AVAudioEngine.
The easiest tool to use here is probably an AVAudioPlayerNode. Your SoundFontHelper is making things much more complicated, so I've removed it and just call TSF directly from Swift. To do this, create a file called tsf.c
as follows:
#define TSF_IMPLEMENTATION
#include "tsf.h"
And add it to BridgingHeader.h
:
#import "tsf.h"
Simplify ContentView to this:
import SwiftUI
struct ContentView: View {
@ObservedObject var settings = Settings.shared
init() {
// You'll want error handling here.
try! settings.prepare()
}
var body: some View {
Button("Play Sound") {
settings.play()
}
}
}
And that leaves the new version of Settings, which is the meat of it:
import SwiftUI
import AVFoundation
class Settings: ObservableObject {
static let shared = Settings()
var engine = AVAudioEngine()
let playerNode = AVAudioPlayerNode()
var tsf: OpaquePointer
var outputFormat = AVAudioFormat()
init() {
let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
tsf = tsf_load_filename(soundFontPath)
engine.attach(playerNode)
engine.connect(playerNode, to: engine.mainMixerNode, format: nil)
updateOutputFormat()
}
// For simplicity, this object assumes the outputFormat does not change during its lifetime.
// It's important to watch for route changes, and recreate this object if they occur. For details, see:
// https://developer.apple.com/documentation/avfaudio/avaudiosession/responding_to_audio_session_route_changes
func updateOutputFormat() {
outputFormat = engine.mainMixerNode.outputFormat(forBus: 0)
}
func prepare() throws {
// Start the engine
try AVAudioSession.sharedInstance().setCategory(.playback)
try engine.start()
playerNode.play()
updateOutputFormat()
// Configure TSF. The only important thing here is the sample rate, which can be different on different hardware.
// Core Audio has a defined format of "deinterleaved 32-bit floating point."
tsf_set_output(tsf,
TSF_STEREO_UNWEAVED, // mode
Int32(outputFormat.sampleRate), // sampleRate
0) // gain
}
func play() {
tsf_note_on(tsf,
0, // preset_index
60, // key (middle C)
1.0) // velocity
// These tones have a long falloff, so you want a lot of source data. This is 10s.
let frameCount = 10 * Int(outputFormat.sampleRate)
// Create a buffer for the samples
let buffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(frameCount))!
buffer.frameLength = buffer.frameCapacity
// Render the samples. Do not mix. This buffer has been extended to
// the needed size by the assignment to `frameLength` above. The call to
// `assumingMemoryBound` is known to be correct because the format is Float32.
let ptr = buffer.audioBufferList.pointee.mBuffers.mData?.assumingMemoryBound(to: Float.self)
tsf_render_float(tsf,
ptr, // buffer
Int32(frameCount), // samples
0) // mixing (do not mix)
// All done. Play the buffer, interrupting whatever is currently playing
playerNode.scheduleBuffer(buffer, at: nil, options: .interrupts)
}
}
You can find the full version at my fork. You can also see the first commit, which is another approach that maintains your SoundFontHelper and does conversions to deal with it, but it's much simpler to just render the audio correctly in the first place.
Upvotes: 6