Christoffer
Christoffer

Reputation: 2411

Which Google Cloud function is preferable to fetch data from external API into GCP?

This should be a very easy question but I can't wrap my head around what to use. I would like to create a data pipeline that fetches data from an outside/external API (for example, Spotify API) and perform some rather simple data cleaning on it, while either continue to create a JSON file in Cloud Storage or enter the data into BigQuery.

As far as I understand I can use Composer to do it, using DAGS etc but what I need here is something more simple/lightweight (mainly UI based) that doesn't cost as much as Composer does as well as being easier to use. What I am looking for is something like Data Factory in Azure.

So, in brief:

Can I handle all of this with one GCP application or do I need to use combinations like Cloud Scheduler, Cloud Functions etc?

Upvotes: 1

Views: 1290

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

As always, you have several options...

Cloud Scheduler seems to be a requirement to trigger regularly the process (up to every minutes).

Then, you have 2 options:

  • Code the process: API Call, transform/clean the data, sink the data into the destination
  • Use Cloud Workflow: you can define the API calls that you want to do
    • Call the API
    • Store the raw data in BigQuery (API Call also, you have connectors to simplify the process)
    • Run a query in BigQuery to clean/format your data and store them into a final table (API Call also)

You can also perform a mix between Cloud Functions to get the data and clean/format the data with a query in BigQuery.


Doing something specific like that without starting from scratch... difficult...


EDIT 1

If you have a look to the documentation, you can see that sample

- getCurrentTime:
    call: http.get
    args:
      url: https://us-central1-workflowsample.cloudfunctions.net/datetime
    result: currentTime
- readWikipedia:
    call: http.get
    args:
      url: https://en.wikipedia.org/w/api.php
      query:
        action: opensearch
        search: ${currentTime.body.dayOfTheWeek}
    result: wikiResult
- returnResult:
    return: ${wikiResult.body[1]}

The first step getCurrentTime performs an external call and store the result in result: currentTime.

In the next step, you can reuse the result currentTime and get only the value that you want in another API call.

And you can plug steps like that.

If you need authentication, you can perform a call to secret manager to get the secret values and then to result the secret manager call result in subsequent steps.

For an easier connection to Google APIs, you can use connectors

Upvotes: 1

Related Questions