Reputation: 6259
I am working on a project involving machine learning and data comparison.
For the purpose of this project, I am feeding abstracted video data to a neuronal network.
Now, abstracting image data is quite simple. I can take still-frames at certain points in the video, scale them down into 5 by 5 pixels (or any other manageable resolution) and get the pixel values for analysis.
The resulting data gives a unique, small and somewhat data-rich sample (even 5 samples of 5x5 px are enough to distinguish a drama from a nature documentary, etc).
However, I am stuck on the audio part. Since audio consists of samples and each sample by itself has no inherent meaning, I can't find a way to abstract audio down into processable blocks.
Are there common techniques for this process? If not, what metrics can audio data be quantified and abstracted in?
Upvotes: 0
Views: 170
Reputation: 9159
The process you require is audio feature extraction. A large number of feature detection algorithms exist, usually specialising in signals that are music or speech. For music, chromacity, rhythm, harmonic distribution are all features you might extract - along with many more. Typically, audio feature extraction algorithms work at a fairly macro level - that is to say thousands of samples at a time.
A good place to get started is Sonic visualiser which is a plug-in host for audio visualisation algorithms - many of which are feature extractors.
YAAFE may also have some useful stuff in it.
Upvotes: 1