Tom Pester
Tom Pester

Reputation: 3813

SQL for the web

Does anyone have experience with a query language for the web?

I am looking for project, commercial or not, that does a good job at making a webpage queryable and that even follows links on it to aggregate information from a bunch of pages.

I would prefere a sql or linq like syntax. I could of course download a webpage and start doing some XPATH on it but Im looking for a solution that has a nice abstraction.

I found websql

http://www.cs.utoronto.ca/~websql/

Which looks good but I'm not into Java

SELECT a.label
FROM Anchor a SUCH THAT base = "http://www.SomeDoc.html"
WHERE a.href CONTAINS ".ps.Z";

Are there others out there?

Is there a library that can be used in a .NET language?

Upvotes: 4

Views: 268

Answers (4)

Colin Pickard
Colin Pickard

Reputation: 46643

Beautiful Soup and hpricot are the canonical versions, for Python and Ruby respectively.

For C#, I have used and appreciated HTML Agility Pack. It does an excellent job of turning messy, invalid HTML in queryable goodness.

There is also this C# html parser which looks good but I've not tried it.

Upvotes: 2

Pistos
Pistos

Reputation: 23792

See hpricot (a Ruby library).

# load the RedHanded home page
doc = Hpricot(open("http://redhanded.hobix.com/index.html"))
# change the CSS class on links
(doc/"span.entryPermalink").set("class", "newLinks")
# remove the sidebar
(doc/"#sidebar").remove
# print the altered HTML
puts doc

It supports querying with CSS or XPath selectors.

Upvotes: 3

Sklivvz
Sklivvz

Reputation: 31133

You are probably looking for SPARQL. It doesn't let you parse pages, but it's designed to solve the same problems (i.e. getting data out of a site -- from the cloud). It's a W3C standard, but Microsoft, apparently, does not support it yet, unfortunately.

Upvotes: 1

Greg Hewgill
Greg Hewgill

Reputation: 992965

I'm not sure whether this is exactly what you're looking for, but Freebase is an open database of information with a programmatic query interface.

Upvotes: 0

Related Questions