Reputation: 259
I'm currently working on a tkinter python school project where the sole purpose is to generate images from audio files, I'm going to pick audio properties and use them as values to generate unique abstract images from it, however I don't know which properties I can analyze to extract the values from. So I was looking for some guidance on which properties (audio frequency, amplitude... etc.) I can extract values from to use to generate the images with Python.
Upvotes: 1
Views: 1324
Reputation: 1037
Providing an additional avenue for exploration: you have some tools to explore this qualitatively (as opposed to quantitatively using metrics derived from the audio signal as suggested in the great answers above)
As you mention the objective is to generate unique abstract images from sound - I would suggest an interesting angle may be to apply some Machine Learning techniques and derive some mood classification predictions from the source audio.
For instance you could use the Tensorflow models in essentia to predict the mood of the track and associate images you select with the mood scores generated. I would suggest going well beyond this and using the tkinter
image creation tools to create your mappings to mood. Use pen and paper to develop your mapping strategy - are certain moods more angular or circular? What colour mappings will you select, and why? You have a great deal of freedom to create these mappings - so start simple as complexity builds naturally.
Using some some simple mood predictions may be more useful for you as someone who has more experience with the qualitative experience with sound rather than the quantitative experience as an audio engineer. I think this may be worth making central to the report you write and documenting your mapping decisions and design process for the report if this is a requirement of the task.
Upvotes: 1
Reputation: 51847
The question is very broad in it's current form. (Bare in mind audio is not my area of expertise so do keep an eye out for the opinion of people working in audio/audiovisual/generative fields.)
You can go about it either way: figure out what kind of image(s) you'd like to create from audio and from there figure out which audio features to use. The other way around is also valid: pick an audio feature you'd like to explore, then think of how you'd best or most interestingly represent that visually.
There's a distintion between image and images.
For a single image, the simplest thing I can think of is drawing a grid of squares where a visual property of the square (e.g. square size, fill colour intensity, etc.) is mapped to the amplitude at that time. The single image would visualise a whole track's amplitude pattern. Even with such a simple example there are many choices you can make (how often you sample, how you layout the grid (cartesian, polar), how each amplitude sample is visualised (could different shapes, sizes, colours, etc.).
(Similar concept to CinemaRedux, simpler for audio only)
You can look into the field of data visualisation for inspiration.
Information is Beautiful is great place to start.
If you want to generate images that seems to go into the audiovisual territory (e.g. abstract animation, audio reactive motion graphics, etc.). Your question originally had the tag Processing tag, which I removed, however you could be using Processing's Python Mode.
In ferms of audio visualisisation one good example I can think is Robert Hogin's work, see Magnetosphere and the Audio-generated landscape prototype. He is using frequency analysis (FFT) with a bit of smoothing/data massaging to amplify the elements useful for visualisation and dampen some of the noise:
(There are a few handy audio libraries such as Minim and beads, however I assume you're intresting in using raw Python, not Jython (which is what the official Processing Python mode uses). He is an answer on FFT analysis for visualisation (even though it's in Processing Java, the principles can be applied in Python)
Personally I've only used pyaudio so far for basic audio tasks. I would assume you could use it for amplitude analysis, but for other more complex tasks you might something extra. Doing a quick search librosa pops up.
If what you want to achieve isn't clear, try prototyping first and start with the simplest audio analysis and visual elements you can think of (e.g. amplitude mapped to boxes over time). Constraints can be great for creativity and the minimal approach could translate into a cleaner, minimal visuals.
You can then look into FFT, MFCC, onset/ beat detection, etc.
Another tool that could be useful for prototyping is Sonic Visualiser.
You can open a track and use some of the built-in feature extractors.
(You can even get away with exporting XML or CSV data from Sonic Visualser which you can load/parse in Python and use to render image(s))
It uses a plugin system (similar to VST plugins in DAWs like Abbleton Live, Apple Logic, etc.) called Vamp plugins. You can then use the VampPy Python wrapper if you need the data at runtime.
(You might also want to draw inspiration from other languages used of audiovisual artworks like PureData + Gems , MaxMSP + Jitter, VVVV, etc.)
Upvotes: 2
Reputation: 338
Time domain: Zero-crossing rate, Root mean square energy ,etc . Frequency Domain: Spectral bandwith,flux,rollof,flatness,MFCC etc. Also ,tempo, You can use librosa
for Python , link : https://librosa.org/doc/latest/index.html for extraction from a .wav file , which implements Fast Fourier Transfrom and framing. And then you can apply some statistics such mean,standard deviation to the vector of the above characteristics across the whole audio file.
Upvotes: 1