Mike
Mike

Reputation: 21659

Duplicate events in Eventarc triggered Google Cloud Run service

I have created a Google Cloud Run service that performs a BigQuery ETL operation in response to a BigQuery event being written to the audit log. My service is written as a Python Flask app and it follows the principles given in How to trigger Cloud Run actions on BigQuery events. More specifically, the service is triggered by Eventarc when Google Analytics data are imported into BigQuery.

I can test this locally by starting the app in a Docker container and sending the service a POST request that contains JSON from an appropriate audit log entry. It works as expected: the ETL operation is performed and no errors are returned.

The app deploys to the Google Cloud without issue. Eventarc correctly triggers the service when the Google Analytics import is complete. The service runs as expected, correctly performing the ETL operation and returning 200 OK response. But then the service is repeatedly invoked with the same event. This loop only stops when the next Eventarc trigger is activated.

The ETL operation takes approximately 60 seconds. If I replace the ETL operation with a time.sleep(60) statement, the same problem occurs, as it does at 10 seconds too. However, if I remove the ETL operation and sleep altogether, the retry loop stops.

Finally, the Metrics Explorer shows a series of webhook_timeout responses for "Cloud Pub/Sub Subscription - Push Requests".

All of this suggests to me that "the system" is retrying the event because it is taking too long. But why? And how do I fix it?

$ gcloud run services describe XXX-svc
✔ Service XXX-svc in region XXX

URL:     https://XXX
Ingress: internal
Traffic:
  100% LATEST (currently XXX)

Last updated on 2022-08-04T08:27:05.918172Z by XXX:
  Revision XXX
  Image:           XXX
  Port:            8080
  Memory:          512Mi
  CPU:             1000m
  Service account: XXX
  Concurrency:     80
  Min Instances:   1
  Max Instances:   1
  Timeout:         300s

$ gcloud --project="${PROJECT}" eventarc triggers describe XXX-trigger --location=XXX
createTime: '2022-08-04T06:59:33.232085395Z'
destination:
  cloudRun:
    region: XXX
    service: XXX-svc
eventFilters:
- attribute: resourceName
  operator: match-path-pattern
  value: projects/XXX/jobs/*
- attribute: type
  value: google.cloud.audit.log.v1.written
- attribute: serviceName
  value: bigquery.googleapis.com
- attribute: methodName
  value: google.cloud.bigquery.v2.JobService.InsertJob
name: projects/XXX/locations/XXX/triggers/XXX-trigger
serviceAccount: XXX
transport:
  pubsub:
    subscription: projects/XXX/subscriptions/eventarc-XXX-XXX-trigger-sub-724
    topic: projects/XXX/topics/eventarc-XXX-XXX-trigger-724
uid: XXX
updateTime: '2022-08-04T10:15:33.683873843Z'

Update

Thanks to the accepted answer from @guillaume blaquiere and the comment from @Pentium10, I was able to update the Pub/Sub subscription acknowledgement deadline:

# List Eventarc trigger names.
gcloud \
  --project="${PROJECT}" \
  eventarc triggers list \
  --format='value(name)'

TRIGGER="..."

# Get the Eventarc trigger Pub/Sub subscription name.
PUBSUB=$(gcloud \
  --project="${PROJECT}" \
  eventarc triggers describe "${TRIGGER}" \
  --format='value(transport.pubsub.subscription)')

# Describe the subscription.
gcloud \
  --format=json \
  pubsub subscriptions describe "${PUBSUB}"

# Update the acknowledgement deadline.
gcloud \
  pubsub subscriptions update "${PUBSUB}" \
  --ack-deadline=300

Upvotes: 4

Views: 2320

Answers (3)

nprime496
nprime496

Reputation: 185

BUT because it's backed on PubSub, you can update the PubSub subscription and update the acknowledgement deadline. The name of the subscription is eventarc---sub-

In case it helps someone, here is the implementation I did of the solution proposed by @guillaume blaquiere using Terraform.

I use a null ressource to update the ack deadline to 10mins (the maximum possible at the time of this post).



ressource "google_eventarc_trigger" "my_trigger"{
# some configuration
}


locals {

  subscription_full_path = "${google_eventarc_trigger.my_trigger.transport[0].pubsub[0].subscription}"
  subscription_parts     = split("/", local.subscription_full_path)
  subscription_name      = local.subscription_parts[length(local.subscription_parts) - 1]
}

resource "null_resource" "update_ack_deadline" {
  provisioner "local-exec" {
    command = <<-EOT
      gcloud pubsub subscriptions update ${local.subscription_name} --ack-deadline=600
    EOT
  }

  depends_on = [
    google_eventarc_trigger.my_trigger
  ]

  lifecycle {
    replace_triggered_by = [ 
google_eventarc_trigger.my_trigger,
    ]
  }

}

Upvotes: 1

Banty
Banty

Reputation: 971

In case this helps someone. After updating the ack deadline on the PubSub Subscription as advised by @Pentium10 and @guillaume blaquiere, I noticed PuBSub still retried delivery twice at 23s and 49s after the first delivery, even though I had increased ack deadline to 300s.

It turned out that by default PubSub attempts immediate redelivery if negative ack is received, or ack deadline has expired. According to this doc, PubSub lets you configure Exponential backoff to address immediate redelivery which can cause issues.

gcloud --format=json pubsub subscriptions describe my-eventarc-subscription
{
  "ackDeadlineSeconds": 300,
  "expirationPolicy": {},
  "labels": {...},
  "messageRetentionDuration": "86400s",
  "name": "full/path/to/my-eventarc-subscription",
  "pushConfig": {...},
  "retryPolicy": {
    "maximumBackoff": "600s",
    "minimumBackoff": "10s"
  },
  "state": "ACTIVE",
  "topic": "full/path/to/my-eventarc-topic"
}

Update the minimumBackoff of retryPolicy:

gcloud pubsub subscriptions update my-eventarc-subscription --min-retry-delay=360s

It is also important that the configured concurrency on the cloud run service matches what the instance(s) can handle before ackDeadline and minimumBackoff elapse.

For example, if messages are sporadic, and each takes 60s to acknowledge, I will set concurrency to 2 if ackDeadline is set to 180s.

Upvotes: 0

guillaume blaquiere
guillaume blaquiere

Reputation: 75970

That's correct. Eventarc is backed on PubSub, and a PubSub subscription, by default, expect an answer in the 10 seconds.

That's the default configuration of Eventarc.

And because your event processing take 60 seconds, it repeat the event in loop...


I got the same issue and I shared it with the PM. For now, there is nothing in eventarc (API or in Terraform (my case)) to fix that.

BUT because it's backed on PubSub, you can update the PubSub subscription and update the acknowledgement deadline. The name of the subscription is eventarc-<REGION or GLOBAL>-<EVENTARC NAME>-sub-<Random suffix>

Upvotes: 7

Related Questions