Reputation: 1
I have downloaded the Freebase dump from https://developers.google.com/freebase/data?hl=en,
I know the format of the dump is <subject> <predicate> <object>
, but I am confused about the relation of the file. Now, I have 2 questions:
If I want to extract the electronic products subset of Freebase, which including some main properties such as /produced by
,/product type
etc. For example: In the subset a record of iPhone contains iPhone's designing company, generations etc. I try to use cygwin to extract it, how can I write the zgrep script?
If I've gotten this subset in *.gz format, how can I query the specified topic using SPARQL or other efficient coding language in a windows environment? For example: querying iPhone's informations about design company. I know even the subset is a large RDF file, can I achieve it?
I really need someone tell me if i can do it or not, thanks.
Upvotes: 0
Views: 130
Reputation: 347
With RDF, you should decide where to store the date you've downloaded in the archive. I assume you want something simple. Download and install Apache Jena. If you want a HTTP interface (and not command line tools) consider Jena Fuseki.
To query the data you need to understand SPARQL. If you are familiar with SQL, learning SPARQL should not take you longer than a few hours. If you have particular questions about what to achieve, ask them on SO again.
With these tools in hand you are able to tackle any RDF file. Even with billions of triples.
Upvotes: 0
Reputation: 10540
Since the Freebase web site is still alive, despite the threats to shut it down, the first thing I'd do is check to see if it's likely to have the information that you want:
https://www.freebase.com/search?query=iphone&any=%2Fcommon%2Ftopic https://www.freebase.com/m/0c0bg9c
If you decide you want to extract a subset, you could either write a small program which takes advantage of the fact that the dump is sorted by subject ID and buffer the current subject's predicates until you decide whether it matches your criteria or use something like zgrep with two passes - one to extract the subject IDs which match and a second to get all predicates for those subject IDs.
Freebase as /business/product_line/category
property which might nominally identify electronic products, but I don't think it's well enough populated to be useful.
Upvotes: 0