Roman
Roman

Reputation: 9441

Additional information for songs recognised by dejavu.py

I'm currently investigating dejavu.py (some more info), and I must say that I am quite impressed by it so far. Though I do find that the docs are a bit incomplete when it comes to user interfacing.
When you recognise a song from file with oDjv.recognize(FileRecognizer, sFile), you get returned a dictionary which looks like this:

{'song_id': 2, 'song_name': 'Sean-Fournier--Falling-For-You', 'file_sha1': 'A9D18B9B9DAA467350D1B6B249C36759282B962E', 'confidence': 127475, 'offset_seconds': 0.0, 'match_time': 32.23410487174988, 'offset': 0}

And from recording (oDjv.recognize(MicrophoneRecognizer, seconds=iSecs)):

{'song_id': 2, 'song_name': 'Sean-Fournier--Falling-For-You', 'file_sha1': 'A9D18B9B9DAA467350D1B6B249C36759282B962E', 'confidence': 124, 'offset_seconds': 24.89179, 'offset': 536}

So, to the questions:
1) What exactly is confidence, and is there an upper bounds for the confidence level?

2) What is the difference between offset_seconds and offset?

3) Why does it take the algorithm somewhere between 30 and 60 seconds (in the case of all tests I ran) to identify the song from disk, but it can do it in 10 or so seconds when recording audio?

4) When running the function to record from audio, I get the following chunk of code preceding the actual output (even if successful) from the function. Where are we trying to go?

ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
bt_audio_service_open: connect() failed: Connection refused (111)
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started

5) Is there an online music Database that I can just plug into the config?

dConfig = {
    "database": {
        "host": "some magical music database",
        "user": "root",
        "passwd": "", 
        "db": "dejavu"
    }
}

oDjv = Dejavu(dConfig)

Upvotes: 3

Views: 2582

Answers (2)

HappyPrime
HappyPrime

Reputation: 45

Everythings been answered really well already, just further clarification for 1).

The reason there are many thousands of fingerprints per file is because Dejavu seeks to recognise songs based on sound, regardless of the length of the song sample, position of the sample in the song, or any noise that might be in the recording (It tries to fulfil the same purpose that Shazam tries to). Each fingerprint is made from a number of data samples of the audio content itself, resulting in a potentially vast number of fingerprints. Dejavu has many twiddle factors that affect the size and the number of the fingerprints obtained, enabling it to be fine-tuned for your own requirements.

If we used only one fingerprint per file then the only way a match could be found is if you fed it exactly that same file.

@tkhurana96, sorry, I don't have the reputation to reply to a comment yet, but hopefully that clarifies things for you.

Upvotes: 2

lollercoaster
lollercoaster

Reputation: 16493

Most of your questions can either be found in the Dejavu github README.md or by the writeup and explanation here.

But to answer each of your numbered questions:

  1. In Dejavu, confidence is the number of fingerprint hashes that "aligned" in the current audio clip to the database closest match. There's no probabilistic interpretation. Keep in mind there can be many thousands of fingerprints per audio file, so have that as a reference point.
  2. They are the same duration of time, but different units. offset_seconds is expressed as seconds, and offset expressed as the length of the algorithm's time bins.
  3. Dejavu fingerprints most songs at 3x listening speed. So a 3 minute song might take longer than, say, a short audio clip that it listens to for 10 seconds. You can adjust how long the default command line mic recognition takes by using python dejavu.py --recognize mic 5 which listens for 5 seconds instead of the default of 10. FYI, one of the best options of the library is you can also change the number of seconds Dejavu uses for on-disk recognition in the JSON config file with the fingerprint_limit key.
  4. There is something wrong with your installation or perhaps you are using a virtual machine which doesn't know how to record audio and get it into pyaudio. In your case see this solution, perhaps it might help.
  5. There is no online music database, you plug into your own MySQL or (soon) PostgreSQL and record your own fingerprints. Dejavu is meant for recognizing all sorts of pre-recorded audio. Plus, each user's needs are different. Want more accurate fingerprinting at the expense of most fingerprints? Raise the DEFAULT_FAN_VALUE. Need higher collision guarantees but don't mind the extra storage cost? You can decrease the FINGERPRINT_REDUCTION and keep more characters of each SHA-1. Dejavu is meant to adapt to many different use cases which necessarily means that if you change fingerprinting parameters in this file your database will have differently distribution and structure.

Upvotes: 6

Related Questions