longtimelurker42
longtimelurker42

Reputation: 23

How do I use common crawl to search the web for a certain keyword query?

Common Crawl is a non-profit third party web search engine. http://commoncrawl.org

I'm seeing the API to search Common Crawl for a given domain.

How can I search common crawl for a given search term?

Upvotes: 1

Views: 1239

Answers (1)

Julien Nioche
Julien Nioche

Reputation: 4864

you can't currently search the content of the web pages. There was commonsearch which used the CC datasets but I am not sure how up to date it is. If you are looking for a limited set of keywords you could use Mapreduce or Spark to filter the pages but if you are dealing with an open or arbitrary set of queries then the best approach would be to index the datasets into Elasticsearch or SOLR yourself.

Upvotes: 3

Related Questions