Reputation: 1314
I want to extract all companies from the Freebase dump. However multiple instances appear to be missing in the dump.
For example Volkswagen (/m/07ywl
) seems to be not included. I searched for the MID using the following regex but could not find any results:
zgrep 'rdf\.freebase\.com/ns/m\.07ywl>' freebase-rdf.gz > res.rdf
The MID should be valid since it is stated on the corresponding Wikidata page and is the top result for Volkswagen when searching for it using the Knowledge Graph API:
https://kgsearch.googleapis.com/v1/entities:search?query=volkswagen&key=<API-KEY>&limit=5&indent=True
Upvotes: 1
Views: 146
Reputation: 211
I was having the same problem on Ubuntu 18.04 because zgrep was interpreting the decompressed data as binary and not decoding the text properly when searching. Using the -a
flag fixed the problem for me:
zgrep -a 'rdf\.freebase\.com/ns/m\.07ywl>' freebase-rdf.gz
Upvotes: 2
Reputation: 10540
That entity exists in freebase-rdf-2015-04-19-00-00.gz
, so I'd be pretty surprised if it didn't exist in the final dump from a few months later (2015-08-09) since the database was write-locked for all except a few Google admins.
My first guess would be that you have a truncated or corrupted download. Did you check the length and MD5 checksum after download?
Upvotes: 0