caffeine_inquisitor
caffeine_inquisitor

Reputation: 727

AWS DataSync from EFS to S3 - connection timed out

I'm trying to create a DataSync task to copy files from EFS to S3, and for this I'm using Terraform. From reading the documentation, it looks like I dont need DataSync agent to do this. Following the guide at https://ystoneman.medium.com/serverless-datasync-from-efs-to-s3-6cb3a7ab85f7, I have created the following

resource "aws_security_group" "sg-datasync" { 
  name = "datasync"
  vpc_id = "vpc-sampleVPC"
}
resource "aws_datasync_location_efs" "source_efs" {
  efs_file_system_arn =  "arn:aws:elasticfilesystem:ap-southeast-2:XXXXX:file-system/fs-6b3f3753"
  ec2_config {
    security_group_arns = [aws_security_group.sg-datasync.arn]
    subnet_arn          = "arn:aws:ec2:ap-southeast-2:XXXXX:subnet/subnet-09d919d3b76e9c7f0"
  }
}
resource "aws_datasync_location_s3" "target_s3" {
  s3_bucket_arn = local.s3_arn
  subdirectory  = "/some_target_folder"

  s3_config {
    bucket_access_role_arn = local.s3_bucket_role_arn
  }
}
resource "aws_datasync_task" "sampleTask" {
  destination_location_arn = aws_datasync_location_s3.target_s3.arn
  name                     = "sampleTask"
  source_location_arn      = aws_datasync_location_efs.source_efs.arn

  options {
    bytes_per_second = -1
  }
}

In addition to this, I have created more security related stuffs:

resource "aws_security_group_rule" "datasync_to_efs" { 
  type                     = "ingress"
  from_port                = 2049
  to_port                  = 2049
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.sg-datasync.id 
  security_group_id        = "sg-049fd2c6708c42c20"
}
resource "aws_security_group_rule" "egress_datasync_to_efs" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = "sg-049fd2c6708c42c20"
  security_group_id        = aws_security_group.sg-datasync.id
}

Also note that 'sg-049fd2c6708c42c20' is the EFS file system's mount target security group. At least that is what I think it is, based on the screenshot below (this is taken from the EFS network configuration for fs-6b3f3753):

EFS Network configuration

So with these, I can see the datasync task and locations created successfully. However, when I tried to run the task, I'm getting connection timed out:

"Task failed to access location loc-0bdebcc42541f73e4: x40016: mount.nfs: Connection timed out"

FYI: loc-0bdebcc42541f73e4 is the source location, and I can see from console, that it has the following details:

sg-0bb0d7ddb3dec8ca6 is the security group 'sg-datasync'. From console, it has no inbound, but it has one outbound rule:

Looking at https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-mounting.html#mount-hangs-fails-timeout, it seems that either I didnt set the EC2 instance or the mount target security groups configuration correctly. My question are:

  1. Where is the EC2 instance configuration on my terraform above? Is it the aws_datasync_location_efs.source_efs.ec2_config ? My guess is.. AWS will spawn off an EC2 instance temporarily to access the EFS, and it is configured using this block ?
  2. Assuming no. 1 is correct, that EC2 has been configured using a) security group 'sg-datasync' b) the 'datasync_to_efs' rule has configured the mount target security group (sg-049fd2c6708c42c20) to allow inbound NFS access from the EC2 security group 'sg-datasync'.

Any help / pointer is very much appreciated!

Upvotes: 1

Views: 2379

Answers (1)

Noah Harasz
Noah Harasz

Reputation: 21

  1. AWS doesn't seem to have any documentation on whether they spin up an EC2 on the backend. These settings are related to EC2 instances though so you can probably safely think there is an instance somewhere on the backend that uses this config to do the sync.

  2. Yes it follows that Datasync will use the specified security group and SG rules you specified in order to access your EFS file system.

In most guides on this it's recommended to actually have 2 different Security groups.

  • One for the EFS mount target (which has the "datasync_to_efs" rule) it should look something like this:
    resource "aws_security_group" "efs" {
      name        = "efs"
      description = "Allow traffic from Datasync to EFS"
      vpc_id      = aws_vpc.vpc.id
    }
    resource "aws_security_group_rule" "datasync_ingress" {
      security_group_id        = aws_security_group.efs.id
      description              = "Allow traffic from Datasync to EFS"
      from_port                = 2049
      to_port                  = 2049
      protocol                 = "tcp"
      type                     = "ingress"
      source_security_group_id = aws_security_group.datasync.id
    }
    
  • One for the Datasync location (allowing all egress)
    resource "aws_security_group" "datasync" {
      name        = "datasync"
      description = "Allow all egress traffic from Datasync"
      vpc_id      = aws_vpc.vpc.id
    }
    resource "aws_security_group_rule" "datasync_egress" {
      security_group_id        = aws_security_group.datasync.id
      description              = "Allow all egress traffic from Datasync"
      from_port                = 0
      to_port                  = 0
      protocol                 = "-1"
      type                     = "ingress"
      source_security_group_id = aws_security_group.datasync.id
    }
    

As heathesh pointed out in their comment, you should also check that the EFS policy and Datasync role allow mounting as well. If you've already solved this issue, please share the solution!

Upvotes: 0

Related Questions