Gidi
Gidi

Reputation: 191

Gettting Actor Ids and biographies from the data dumps or Freebase API

Does anyone know the best way of getting Actor Ids from Freebase data dumps, and later on getting the IMDB ids and biographies from the Freebase API?

Upvotes: 2

Views: 924

Answers (1)

Tom Morris
Tom Morris

Reputation: 10540

Actors will have the type /film/actor and look like this in the dump:

ns:m.010q36     rdf:type        ns:film.actor.

You can find them all in a few minutes from the compressed dump with a simple grep:

zgrep $'rdf:type\tns:film.actor.' freebase-rdf-<date of dump>.gz | cut -f 1 | cut -d ':' -f 2 > actor-mids.txt

This will generate a list of MIDs in the form m.010q36 which represents the MID /m/010q36.

Using the list of MIDs, look for all lines which have that MID in the first column, one of your desired properties in the second. You could do this using Python, grep, or the tool/language of your choice. Of course if you're using a programming language like Python, you could roll the initial search.

Wikipedia and IMDB IDs are stored as what Freebase calls keys and look like this (MusicBrainz & Netflix included too):

ns:m.010q36     ns:type.object.key      "/wikipedia/en/Mr$002ERodgers".
ns:m.010q36     ns:type.object.key      "/authority/imdb/name/nm0736872".
ns:m.010q36     ns:type.object.key      "/authority/musicbrainz/87467525-3724-412d-ad3e-595ecb6a3bfd".
ns:m.010q36     ns:type.object.key      "/authority/netflix/role/30006685".

Keys may be encoded (like the Wikipedia key above). You can find documentation on the Freebase wiki on how to deal with them.

Upvotes: 4

Related Questions