Jarede
Jarede

Reputation: 3488

Permissions required to restore automated AWS elasticsearch snapshot

I have an AWS Opensearch/ElasticSearch domain with a cluster configuration to take hourly snapshots. I'm trying to automate the restoration of the snapshot when the cluster goes down.

At the moment I'm hitting an issue where the code receiving a timeout when trying to get a list of available snapshots from the cs-automated repository:

FailedExecution: Unable to get snapshot information from repository: cs-automated. Error: ConnectionTimeout caused by - ReadTimeout(HTTPSConnectionPool(host='my-domain.eu-west-1.es.amazonaws.com', port=443): Read timed out. (read timeout=10))

With the python 3.8 code looking like this:

import boto3
import curator
import datetime
import json
from elasticsearch import Elasticsearch, RequestsHttpConnection
import os
from requests_aws4auth import AWS4Auth

def handler(event, context):
    ... # get host details for connection 
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
    # Build the Elasticsearch client.
    es = Elasticsearch(
        hosts = [{'host': host, 'port': 443}],
        http_auth = awsauth,
        use_ssl = True,
        verify_certs = True,
        connection_class = RequestsHttpConnection
    )

    index_list = curator.SnapshotList(es, repository="cs-automated")

I have added these iamRoleStatements to my serverless configuration

    - Effect: Allow
      Resource:
        - arn:aws:es:${aws:region}:${aws:accountId}:domain/${self:custom.domains.${opt:stage}.reportinganalytics}/*
      Action:
        - es:ESHttpGet
    - Effect: Allow
      Resource: arn:aws:s3:::cs-automated/*
      Action:
        - s3:GetObject
    - Effect: Allow
      Resource: arn:aws:s3:::cs-automated
      Action:
        - s3:ListBucket

But this still results in a ConnectionTimeout. Am I missing a permission? When I log out the information for the ES client connection, the host it's using matches the domain endpoint in AWS opensearch.

Upvotes: 0

Views: 740

Answers (1)

akos
akos

Reputation: 2644

ConnectionTimeout means that there is no network route between your Lambda function and the OpenSearch cluster. This is not an IAM policy issue, that would result in a successful network connection with 403 HTTP response codes from the AWS API.

Your OpenSearch cluster is in a VPC with a security group to control network access to the cluster. I believe your Lambda function is either:

  • not in a VPC so it cannot access private resources
  • or it's in a different VPC without cross-VPC access, so there is no network route to the OpenSearch cluster
  • or it's in the same VPC but the OpenSearch security group doesn't allow inbound network access on port 443

I'd recommend to verify these:

  • ensure your Lambda function is in the same VPC as your OpenSearch cluster. If it's not in a VPC it cannot access the OpenSearch API. If it's in a different VPC, you need VPC peering or similar cross-VPC access.
  • ensure the security group of your OpenSearch cluster allows inbound TCP traffic on port 443 from the security group ID that is attached to your Lambda function

This AWS page might also help to debug the connectivity issue: Configuring a Lambda function to access resources in a VPC

Hope this helps, let me know how it goes!

Upvotes: 1

Related Questions