user1812844
user1812844

Reputation: 297

Detecting audio file integrity with python

If I download an audio file from the web and something bad happens to the download process, how does one efficiently detect that the audio file is incomplete with python?

There are some ideas, such as using the file command in linux:

file audio.mp4

But it recognizes that it's mp4:

audio.mp4: ISO Media, MPEG v4 system, version 2

Even mplayer detects the mp4 audio type, but fails when trying to play. I don't think launching mplayerfrom python and checking if it failed is a scalable solution though.

Here is a sample of broken file: https://www.dropbox.com/s/5rpscb9r1xrrx4t/They

The sample above fails with mutagen and mp4file, causing them to hang indefinitely. It has to do with fileObject.tell().

Upvotes: 2

Views: 1460

Answers (1)

abarnert
abarnert

Reputation: 366003

There are many different audio file formats, and container formats for things that that may or may not be audio files.

Fortunately, there are libraries that can a wide variety of different kinds of files. And there are Python wrappers for:

  • Portable command-line tools like ffmpeg and mplayer.
  • Portable libraries like libavcodec (what ffmpeg uses).
  • Platform-specific libraries like Core Audio or QuickTime or Windows Media.

If you're willing to use separate wrappers for separate file types, there are even more choices (e.g., libmp4v2 is great for MP4 files, but useless for anything else).

Of course there are huge tradeoffs—the more powerful libraries are often going to be more complex, or have more prerequisites. Do some searching at http://pypi.python.org/ to see what turns up; you should be able to find something that does everything you want.

For one really simple example, mp4file will attempt to parse any MPEG4 container. If it's incomplete, or has any invalid atoms, you'll get an exception. So, the check is just one line, mp4file.Mp4File(path). If it succeeds, it's complete; if it throws an exception, it's incomplete or invalid. But of course this will accept a complete MPEG4 video file, or MPEG4 with no audio or video in it, and it will reject a complete MP3, or even a complete M4A with one broken metadata tag.

Upvotes: 2

Related Questions