fwind
fwind

Reputation: 1314

Freebase: Instances missing in dump

I want to extract all companies from the Freebase dump. However multiple instances appear to be missing in the dump.

For example Volkswagen (/m/07ywl) seems to be not included. I searched for the MID using the following regex but could not find any results:

zgrep 'rdf\.freebase\.com/ns/m\.07ywl>' freebase-rdf.gz > res.rdf

The MID should be valid since it is stated on the corresponding Wikidata page and is the top result for Volkswagen when searching for it using the Knowledge Graph API:

https://kgsearch.googleapis.com/v1/entities:search?query=volkswagen&key=<API-KEY>&limit=5&indent=True

Upvotes: 1

Views: 146

Answers (2)

Albert
Albert

Reputation: 211

I was having the same problem on Ubuntu 18.04 because zgrep was interpreting the decompressed data as binary and not decoding the text properly when searching. Using the -a flag fixed the problem for me:

zgrep -a 'rdf\.freebase\.com/ns/m\.07ywl>' freebase-rdf.gz

Upvotes: 2

Tom Morris
Tom Morris

Reputation: 10540

That entity exists in freebase-rdf-2015-04-19-00-00.gz, so I'd be pretty surprised if it didn't exist in the final dump from a few months later (2015-08-09) since the database was write-locked for all except a few Google admins.

My first guess would be that you have a truncated or corrupted download. Did you check the length and MD5 checksum after download?

Upvotes: 0

Related Questions