Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7694

How do I chain two Apify actors?

I need to scrape an URL list obtained by a Google search, using the Apify platform.

My plan is to start from a Google Search Scraper Actor task. However I don't think it can be used to scrape anything else than the Google search results (maybe I'm wrong ?). Therefore I need to provide its output to another Actor task, e.g. a Web Scraper or a Puppeteer Scraper.

But I can't seem to find the documentation related to the chaining of Actors. How should I proceed ?

Update :

I found How to pass data from crawler to actor, and setting an ACTOR.RUN.SUCCEEDED webhook on the Run task API endpoint of the second actor seems to work (that is, the second actor is launched).

However I can't seem to find how to pass the first actor's dataset to the second actor : the Start URLs field being mandatory I guess I should set it to the dataset, however the dataset link is different for each run…

Upvotes: 1

Views: 1714

Answers (1)

Ondra Urban
Ondra Urban

Reputation: 677

You can chain multiple actor runs either via the Metamorph feature, or using Webhooks.

Metamorph

Metamorph allows you to run an actor and while the actor is running, "morph" it into a different actor with a custom input. The original actor will be stopped and replaced by the second one, but both will use the same storages, have the same run ID and will be displayed as a single actor run in the Apify app. You can use metamorph multiple times in a single run.

You can find the documentation for Metamorph here.

Webhooks

Webhooks allow you to call an arbitrary API endpoint once an actor reaches a given status, for example: SUCCEEDED. You can use this to call the Run Actor API to start another actor. You can set a custom payload for the webhooks, however, at this moment, passing output directly as webhook payload is not supported, so you'll need to use the ID of a key value store or dataset, where your results are stored and read it from there.

See the Webhooks docs here.

For example, to get the IDs of both key value store and dataset of the original actor, you would configure a payload like this:

{
    "datasetId": {{resource.defaultDatasetId}},
    "keyValueStoreId": {{resource.defaultKeyValueStoreId}}
}

Passing data from Google Search Scraper to Web Scraper

The task is not trivial because the Google Search output format is not compatible with the Web Scraper input format. The best way to do this is to create an intermediary actor that uses the output from Google Search Scraper to produce an input for Web Scraper and then metamorph into it. So the final flow is:

Google Search Scraper --webhook--> Output Processor Actor --metamorph--> Web Scraper.

Upvotes: 3

Related Questions