olegthesoundman
olegthesoundman

Reputation: 125

VoiceProcessingIO Audio Unit adds an unexpected input stream to Built-in output device (macOS)

I work on VoIP app on macOS and use VoiceProcessingIO Audio Unit for audio processing like Echo cancellation and automatic gain control.

Problem is, when I init the audio unit, the list of Core Audio devices changes - not just by adding new aggregate device which VP audio unit uses for it's needs, but also because built-in output device (i.e. "Built - In MacBook Pro Speakers") now appears also as an input device, i.e. having an unexpected input stream in addition to output ones.

This is a list of INPUT devices (aka "microphones") I get from Core Audio before initialising my VP AU:

DEVICE:      INPUT   45      BlackHole_UID
DEVICE:      INPUT   93      BuiltInMicrophoneDevice

This is the same list when my VP AU is initialised:

DEVICE:      INPUT   45      BlackHole_UID
DEVICE:      INPUT   93      BuiltInMicrophoneDevice
DEVICE:      INPUT   86      BuiltInSpeakerDevice /// WHY?
DEVICE:      INPUT   98      VPAUAggregateAudioDevice-0x101046040

This is very frustrating because I need to display a list of devices in the app and even though I can filter out Aggregate devices from device list boldly (they are not usable with VP AU anyway), I cannot exclude our built-in macBook Speaker device.

Maybe someone of You has already been through this and has a clue what's going on and if this can be fixed. Some kAudioObjectPropertyXX I need to watch for to exclude the device from inputs list. Or course this might be a bug/feature on Apple's side and I simply have to hack my way around this.

VP AU works well, and the problem reproduces despite devices used (I tried on built-in and on external/USB/Bluetooth alike). The problem is reproduced on all macOS version I could test on, starting from 10.13 and ending by 11.0 included. This also reproduces on different Macs and different audio device sets connected. I am curious that there is next to zero info on that problem available, which brings me to a thought that I did something wrong.

One more strange thing is, when VP AU is working, the HALLab app indicates the another thing: Built-in Input having two more input streams (ok, I would survive this If it was just that!). But it doesn't indicate that Built-In output has input streams added, like in my app.

Here is extract from cpp code on how I setup VP Audio Unit:

#define MAX_FRAMES_PER_CALLBACK 1024
        
AudioComponentInstance AvHwVoIP::getComponentInstance(OSType type, OSType subType) {
        AudioComponentDescription desc = {0};
        desc.componentFlags = 0;
        desc.componentFlagsMask = 0;
        desc.componentManufacturer = kAudioUnitManufacturer_Apple;
        desc.componentSubType =  subType;
        desc.componentType    = type;
        
        AudioComponent ioComponent = AudioComponentFindNext(NULL, &desc);
        AudioComponentInstance unit;
        OSStatus status = AudioComponentInstanceNew(ioComponent, &unit);
        if (status != noErr) {
            printf("Error: %d\n", status);
        }
        return unit;
    }
    
    void AvHwVoIP::enableIO(uint32_t enableIO, AudioUnit auDev) {
        
        UInt32 no = 0;
        
        setAudioUnitProperty(auDev,
                             kAudioOutputUnitProperty_EnableIO,
                             kAudioUnitScope_Input,
                             1,
                             &enableIO,
                             sizeof(enableIO));
        
        setAudioUnitProperty(auDev,
                             kAudioOutputUnitProperty_EnableIO,
                             kAudioUnitScope_Output,
                             0,
                             &enableIO,
                             sizeof(enableIO));
    }
    
    void AvHwVoIP::setDeviceAsCurrent(AudioUnit auDev, AudioUnitElement element, AudioObjectID devId) {
        //Set the Current Device to the AUHAL.
        //this should be done only after IO has been enabled on the AUHAL.
        setAudioUnitProperty(auDev,
                             kAudioOutputUnitProperty_CurrentDevice,
                             element == 0 ? kAudioUnitScope_Output : kAudioUnitScope_Input,
                             element,
                             &devId,
                             sizeof(AudioDeviceID));
    }
    
    void AvHwVoIP::setAudioUnitProperty(AudioUnit auDev,
                                           AudioUnitPropertyID inID,
                                            AudioUnitScope inScope,
                                            AudioUnitElement inElement,
                                            const void* __nullable inData,
                                            uint32_t inDataSize) {
    
        OSStatus status = AudioUnitSetProperty(auDev, inID, inScope, inElement, inData, inDataSize);
        if (noErr != status) {
            std::cout << "****** ::setAudioUnitProperty failed" << std::endl;
        }
        
    }
    
    void AvHwVoIP::start() {
        m_auVoiceProcesing = getComponentInstance(kAudioUnitType_Output, kAudioUnitSubType_VoiceProcessingIO);
        enableIO(1, m_auVoiceProcesing);
        m_format_description = SetAudioUnitStreamFormatFloat(m_auVoiceProcesing);
        SetAudioUnitCallbacks(m_auVoiceProcesing);
        setDeviceAsCurrent(m_auVoiceProcesing, 0,  m_renderDeviceID);//output device AudioDeviceID here
        setDeviceAsCurrent(m_auVoiceProcesing, 1,  m_capDeviceID);//input device AudioDeviceID here
        setInputLevelListener();
        setVPEnabled(true);
        setAGCEnabled(true);
        
        UInt32 maximumFramesPerSlice = 0;
        UInt32 size = sizeof(maximumFramesPerSlice);
        OSStatus s1 = AudioUnitGetProperty(m_auVoiceProcesing, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maximumFramesPerSlice, &size);
    
        printf("max frames per callback: %d\n", maximumFramesPerSlice);
        
        maximumFramesPerSlice = MAX_FRAMES_PER_CALLBACK;
        s1 = AudioUnitSetProperty(m_auVoiceProcesing, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maximumFramesPerSlice, size);
        
        
        OSStatus status = AudioUnitInitialize(m_auVoiceProcesing);
        if (noErr != status) {
            printf("*** error AU initialize: %d", status);
        }
        
        status = AudioOutputUnitStart(m_auVoiceProcesing);
        if (noErr != status) {
            printf("*** AU start error: %d", status);
        }
    }

And Here is how I get my list of devices:

//does this device have input/output streams?    
bool hasStreamsForCategory(AudioObjectID devId, bool input)
    {
        const AudioObjectPropertyScope scope = (input == true ? kAudioObjectPropertyScopeInput : kAudioObjectPropertyScopeOutput);
        
        AudioObjectPropertyAddress propertyAddress{kAudioDevicePropertyStreams, scope, kAudioObjectPropertyElementWildcard};
    
        uint32_t dataSize = 0;
        OSStatus status = AudioObjectGetPropertyDataSize(devId,
                                                         &propertyAddress,
                                                         0,
                                                         NULL,
                                                         &dataSize);
        if (noErr != status)
            printf("%s: Error in AudioObjectGetPropertyDataSize: %d \n", __FUNCTION__, status);
    
        return (dataSize / sizeof(AudioStreamID)) > 0;
    }
    
    std::set<AudioDeviceID> scanCoreAudioDeviceUIDs(bool isInput)
    {
        std::set<AudioDeviceID> deviceIDs{};
        
        // find out how many audio devices there are
        AudioObjectPropertyAddress propertyAddress = {kAudioHardwarePropertyDevices, kAudioObjectPropertyScopeGlobal, kAudioObjectPropertyElementMaster};
        
        uint32_t dataSize{0};
        OSStatus err = AudioObjectGetPropertyDataSize(kAudioObjectSystemObject, &propertyAddress, 0, NULL, &dataSize);
        if ( err != noErr )
        {
            printf("%s: AudioObjectGetPropertyDataSize: %d\n", __FUNCTION__, dataSize);
            return deviceIDs;//empty
        }
        
        // calculate the number of device available
        uint32_t devicesAvailable = dataSize / sizeof(AudioObjectID);
        if ( devicesAvailable < 1 )
        {
            printf("%s: Core audio available devices were not found\n", __FUNCTION__);
            return deviceIDs;//empty
        }
        
        AudioObjectID devices[devicesAvailable];//devices to get
        
        err = AudioObjectGetPropertyData(kAudioObjectSystemObject, &propertyAddress, 0, NULL, &dataSize, devices);
        if ( err != noErr )
        {
            printf("%s: Core audio available devices were not found\n", __FUNCTION__);
            return deviceIDs;//empty
        }
        
        const AudioObjectPropertyScope scope = (isInput == true ? kAudioObjectPropertyScopeInput : kAudioObjectPropertyScopeOutput);
        
        for (uint32_t i = 0; i < devicesAvailable; ++i)
        {
            const bool hasCorrespondingStreams = hasStreamsForCategory(devices[i], isInput);
    
            if (!hasCorrespondingStreams) {
                continue;
            }
            
            printf("DEVICE: \t %s \t %d \t %s\n", isInput ? "INPUT" : "OUTPUT", devices[i], deviceUIDFromAudioDeviceID(devices[i]).c_str());
            
            deviceIDs.insert(devices[i]);
        }//end for
        
        return deviceIDs;
    }

Upvotes: 3

Views: 867

Answers (2)

Mikey
Mikey

Reputation: 1332

I ran into this problem too. The application I'm working on also needs to observe for audio device changes which can occur while capturing/rendering, so I couldn't reliably ignore devices before/after the VPIO unit was created.

After a bit of digging, I found a working solution: when iterating through devices, private aggregate devices should be filtered as well as devices with only unknown stream types for input.

You can filter out aggregate devices which are private using kAudioDevicePropertyTransportType to determine aggregate devices, and kAudioAggregateDevicePropertyComposition to get data about the device. It returns a CFDictionaryRef with a kAudioAggregateDeviceIsPrivateKey key carrying the value we care about.

To inspect the streams, you can iterate through each AudioStreamID present in kAudioDevicePropertyStreams. By checking the kAudioStreamPropertyDirection of the stream, you can determine if it's output or input, and kAudioStreamPropertyTerminalType will tell you the type. If the input stream types are all kAudioStreamTerminalTypeUnknown and there aren't any valid output streams, it's safe to filter out.

I found the solution by looking through the Chromium source code, which tells me that this solution must work pretty well :)

See IsPrivateAggregateDevice and IsInputDevice.

Cheers!

Upvotes: 2

olegthesoundman
olegthesoundman

Reputation: 125

Well, replying my own question in 4 months since I wrote about this issue to Apple, because today Apple Feedback Assistant responded to my request:

"There are two things you were noticing, both of which are expected and considered as implementation details of AUVP:

  1. The speaker device has input stream - this is the reference tap stream for echo cancellation.
  2. There is additional input stream under the built-in mic device - this is the raw mic streams enabled by AUVP.

For #1, We'd advise you to treat built-in speaker and (on certain Macs) headphone with special caution when determining whether it’s input/output device based on its input/output streams.

For #2, We'd advise you to ignore the extra streams on the device."

So they suggest me doing exactly what I did then:

  • determine built - in output device before starting AU and then just memorise it;
  • ignore any extra streams that appear in built - in devices during VP AU operation.

Upvotes: 5

Related Questions