Mazzespazze
Mazzespazze

Reputation: 111

Is it possible to get titles from the webversion of Common Crawler API?

I am trying to get urls, titles and languages from webpages. Fortunately there exists the CC API https://github.com/webrecorder/pywb/wiki/CDX-Server-API#api-reference. But sadly I did not notice a way to get also the titles.

At the moment I query CC as (for example) http://index.commoncrawl.org/CC-MAIN-2018-47-index?url=www.example.com/*&output=json where I get "url" and "languages" information.

Is there any way to query CC through the API without downloading every warc and getting the titles?

Thanks!

Upvotes: 1

Views: 202

Answers (1)

Sebastian Nagel
Sebastian Nagel

Reputation: 2239

No. The page title isn't indexed in Common Crawl's URL index (neither in the CDX index nor the columnar index).

Upvotes: 3

Related Questions