perryzheng
perryzheng

Reputation: 199

Given a list of company names, how to fetch company names, website url, year established, number of employees etc

I have a list of company names such as Microsoft Corp, Kimberly Clark Corporation etc, and for each company, I would like to retrieve fields such as:

  1. Its company logo
  2. Georgraphic identifier for google maps
  3. Website url
  4. Year established
  5. Stock exchange and stock exchange ticker symbol
  6. A way to get the stock prices over the last few days
  7. About / abstract from wikipedia
  8. A list of subsidiaries and parent companies. For instance, for Boeing it would be Jeppessen and Availl, Inc etc.

I have looked into Sparql and Dbpedia. Any suggestion on how to come up with the sparql query to retrieve some of those information? (I don't need to retrieve all the fields just a couple fields for me to get started.)

Thanks!

Upvotes: 4

Views: 2246

Answers (2)

Joshua Taylor
Joshua Taylor

Reputation: 85823

You can start using a query like this:

select * where {
  values ?company { dbpedia:Microsoft
                    <http://dbpedia.org/resource/Apple_Inc.>
                    dbpedia:Kimberly-Clark
                  } 
  OPTIONAL { { ?company dbpprop:logo ?logo  FILTER(isIRI(?logo)) }
             UNION 
             { ?company foaf:depiction ?logo FILTER(isIRI(?logo)) } }
  OPTIONAL { ?company dbpedia-owl:abstract ?abstract 
             FILTER(langMatches(lang(?abstract),"EN")) }
  OPTIONAL { ?company geo:lat ?latitude ;
                      geo:long ?longitude }
  OPTIONAL { ?company dbpedia-owl:foundingDate ?foundingDate }
  OPTIONAL { ?company dbpedia-owl:wikiPageExternalLink ?externalLink }
  OPTIONAL { ?company dbpprop:symbol ?stockSymbol }
  OPTIONAL { ?company dbpedia-owl:subsidiary ?subsidiaryPage }
}

SPARQL Results

I based this on the properties I saw on the DBpedia pages for Microsoft, Kimberly-Clark, and Apple, Inc.. The data isn't particularly clean, and because of that, I added a few filters to the query:

  • Not all of these list subsidiaries, and the subsidiary property for Microsoft doesn't relate to subsidiaries, but a page that presumably enumerates some subsidiaries).

  • Some of the companies have bad information for the logos (hence the FILTERs with isIRI). For instance, Apple's dbpprop:logo is the integer 150. I think that that comes from the Wikipedia infobox line | logo = [[File:{{#property:p154}}|150px]], where 150 is getting pulled out rather than a more meaningful value. Filtering by isIRI helps a little bit.

  • Some of the companies have multiple founding dates. I'm not sure how you might decided which of the multiple ones to use.

  • While the company page is usually listed as an external link, not all of the external links associated with a page are the company page. I'm not sure how you could select one as the company page.

All that said, it looks like you can get a lot of this information from DBpedia.

Upvotes: 3

Pierre
Pierre

Reputation: 35246

you could start with the following sparql query. It retrieves all the triples for a subject having a name=Apple Inc.".

select distinct ?subject ?predicate ?object where { 
  ?subject ?predicate ?object .
  ?subject <http://xmlns.com/foaf/0.1/name> "Apple Inc."@en .
}

SPARQL results

subject     predicate   object
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://www.w3.org/2002/07/owl#Thing
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/ontology/Company
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://www.opengis.net/gml/_Feature
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/ontology/Organisation
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/ontology/Agent
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://schema.org/Organization
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/ComputerCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/SoftwareCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/RetailCompaniesOfTheUnitedStates
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/CompaniesEstablishedIn1976
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/ComputerHardwareCompanies
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://umbel.org/umbel/rc/Organization
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/Company108058098
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/HomeComputerHardwareCompanies
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/CompaniesBasedInCupertino,California
http://dbpedia.org/resource/Apple_Inc.  http://www.w3.org/1999/02/22-rdf-syntax-ns#type     http://dbpedia.org/class/yago/MobilePhoneManuFACturers

Upvotes: 1

Related Questions