johan855
johan855

Reputation: 1626

Listing extractors from import.io

I would like to know how to get the crawling data (list of URLs manually input through the GUI) from my import.io extractors. The API documentation is very scarce and it does not specify if the GET requests I make actually start a crawler (and consume one of my crawler available runs) or just query the result of manually launched crawlers.

Also I would like to know how to obtain the connector ID, as I understand, an extractor is nothing more than a specialized connector, but when I use the extractor_id as the connector id for querying the API, I get the connector does not exist.

A way I thought I could have listed the URLs I have in one off my extractors is this:

https://api.import.io/store/connector/_search?

_sortDirection=DESC&_default_operator=OR&_mine=true&_apikey=123...

But the only result I get is:

{ "took": 2, "timed_out": false, "hits": { "total": 0, "hits": [], "max_score": 0 } }

Nevertheless, even if I would get a more complete response, the example result I see in the documentation ddoes not mention any kind of list or element containing the URLs I'm trying to get from my import.io account.

I am using python to create this API

Upvotes: 1

Views: 237

Answers (1)

Blake Burkett
Blake Burkett

Reputation: 51

The legacy API will not work for any non-legacy connectors, so you will have to use the new Web Extractor API. Unfortunately, there is no documentation for this.

Luckily, with some snooping you can find the following call to list connectors connected to your apikey:

https://store.import.io/store/extractor/_search?_apikey=YOUR_API_KEY

From here, You check each hit and verify the _type property is set to EXTRACTOR. This will give you access to, among other things, the GUID associated with the extractor and the name you chose for it when you created it.

You can then do the following to download the latest run from the extractor in CSV format:

https://data.import.io/extractor/{{GUID}}/csv/latest?_apikey=YOUR_API_KEY

This was found in the Integrations tab of every Web Extractor. There are other queries there as well.

Hope this helps.

Upvotes: 1

Related Questions