GorginZ
GorginZ

Reputation: 131

EC2 instance can't access amazon-linux repos (eg amazon-linux-extras install docker) through s3 gateway endpoint

I'm having s3 endpoint grief. When my instances initialize they can not install docker. Details:

I have ASG instances sitting in a VPC with pub and private subnets. Appropriate routing and EIP/NAT is all stitched up.Instances in private subnets have outbouond 0.0.0.0/0 routed to NAT in respective public subnets. NACLs for public subnet allow internet traffic in and out, the NACLs around private subnets allow traffic from public subnets in and out, traffic out to the internet (and traffic from s3 cidrs in and out). I want it pretty locked down.

Amazon EC2 instance can't update or use yum

another s3 struggle with resolution:

https://blog.saieva.com/2020/08/17/aws-s3-endpoint-gateway-access-for-linux-2-amis-resolving-http-403-forbidden-error/

I have tried:

  S3Endpoint:
Type: 'AWS::EC2::VPCEndpoint'
Properties:
  PolicyDocument:
    Version: 2012-10-17
    Statement:
      - Effect: Allow
        Principal: '*'
        Action:
          - 's3:GetObject'
        Resource: 
          - 'arn:aws:s3:::prod-ap-southeast-2-starport-layer-bucket/*'
          - 'arn:aws:s3:::packages.*.amazonaws.com/*'
          - 'arn:aws:s3:::repo.*.amazonaws.com/*'
          - 'arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/*'
          - 'arn:aws:s3:::amazonlinux.*.amazonaws.com/*'
          - 'arn:aws:s3:::*.amazonaws.com'
          - 'arn:aws:s3:::*.amazonaws.com/*'
          - 'arn:aws:s3:::*.ap-southeast-2.amazonaws.com/*'
          - 'arn:aws:s3:::*.ap-southeast-2.amazonaws.com/'
          - 'arn:aws:s3:::*repos.ap-southeast-2-.amazonaws.com'
          - 'arn:aws:s3:::*repos.ap-southeast-2.amazonaws.com/*'
          - 'arn:aws:s3:::repo.ap-southeast-2-.amazonaws.com'
          - 'arn:aws:s3:::repo.ap-southeast-2.amazonaws.com/*'
  RouteTableIds:
    - !Ref PrivateRouteTableA
    - !Ref PrivateRouteTableB   
  ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
  VpcId: !Ref BasicVpc
  VpcEndpointType: Gateway

(as you can see, very desperate) The first rule is required for the ECR interface endpoints to pull the image layers from s3, all of the others are attempts to reach amazon-linux-extras repos.

Below is the behavior happening on initialization I have recreated by connecting with session manager using SSM endpoint:

https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/

I can not yum install or update

root@ip-10-0-3-120 bin]# yum install docker -y

Loaded plugins: extras_suggestions, langpacks, priorities, update-motd Could not retrieve mirrorlist https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/core/latest/x86_64/mirror.list error was 14: HTTPS Error 403 - Forbidden

One of the configured repositories failed (Unknown), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this:

 1. Contact the upstream for the repository and get them to fix the problem.

 2. Reconfigure the baseurl/etc. for the repository, to point to a working
    upstream. This is most often useful if you are using a newer
    distribution release than is supported by the repository (and the
    packages for the previous distribution release still work).

 3. Run the command with the repository temporarily disabled
        yum --disablerepo=<repoid> ...

 4. Disable the repository permanently, so yum won't use it by default. Yum
    will then just ignore the repository until you permanently enable it
    again or use --enablerepo for temporary usage:

        yum-config-manager --disable <repoid>
    or
        subscription-manager repos --disable=<repoid>

 5. Configure the failing repository to be skipped, if it is unavailable.
    Note that yum will try to contact the repo. when it runs most commands,
    so will have to try and fail each time (and thus. yum will be be much
    slower). If it is a very temporary problem though, this is often a nice
    compromise:

        yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: amzn2-core/2/x86_64

and can not:

amazon-linux-extras install docker

Catalog is not reachable. Try again later.

catalogs at https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/extras-catalog-x86_64-v2.json, https://amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/2/extras-catalog-x86_64.json Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/amazon_linux_extras/software_catalog.py", line 131, in fetch_new_catalog request = urlopen(url) File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 435, in open response = meth(req, response) File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.7/urllib2.py", line 473, in error return self._call_chain(*args) File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden

Any gotchas I've missed? I'm very stuck here. I am familiar with basic VPC networking, NACLs and VPC endpoints (the ones I've used at least), I have followed the trouble-shooting (although I already had everything set-up as outlined).

I feel the s3 policy is the problem here OR the mirror list. Many thanks if you bothered to read all that! Thoughts?

Upvotes: 3

Views: 10637

Answers (3)

Nick
Nick

Reputation: 1273

By the looks of it, you are well aware of what you are trying to achieve. Even though you are saying that it is not the NACLs, I would check them one more time, as sometimes one can easily overlook something minor. Take into account the snippet below taken from this AWS troubleshooting article and make sure that you have the right S3 CIDRs in your rules for the respective region:

Make sure that the network ACLs associated with your EC2 instance's subnet allow the following: Egress on port 80 (HTTP) and 443 (HTTPS) to the Regional S3 service. Ingress on ephemeral TCP ports from the Regional S3 service. Ephemeral ports are 1024-65535. The Regional S3 service is the CIDR for the subnet containing your S3 interface endpoint. Or, if you're using an S3 gateway, the Regional S3 service is the public IP CIDR for the S3 service. Network ACLs don't support prefix lists. To add the S3 CIDR to your network ACL, use 0.0.0.0/0 as the S3 CIDR. You can also add the actual S3 CIDRs into the ACL. However, keep in mind that the S3 CIDRs can change at any time.

Your S3 endpoint policy looks good to me on first look, but you are right that it is very likely that the policy or the endpoint configuration in general could be the cause, so I would re-check it one more time too.

One additional thing that I have observed before is that depending on the AMI you use and your VPC settings (DHCP options set, DNS, etc) sometimes the EC2 instance cannot properly set it's default region in the yum config. Please check whether the files awsregion and awsdomain exist within the /etc/yum/vars directory and what's their content. In your use case, the awsregion should have:

$ cat /etc/yum/vars/awsregion
ap-southeast-2

You can check whether the DNS resolving on your instance is working properly with:

dig amazonlinux.ap-southeast-2.amazonaws.com

If DNS seems to be working fine, you can compare whether the IP in the output resides within the ranges you have allowed in your NACLs.

EDIT:

After having a second look, this line, is a bit stricter than it should be: arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2.s3.ap-southeast-2.amazonaws.com/*

According to the docs it should be something like:

arn:aws:s3:::amazonlinux-2-repos-ap-southeast-2/*

Upvotes: 3

Technobeats
Technobeats

Reputation: 21

I had a similar issue. running "amazon-linux-extras" wasn't doing anything at all.

Problem was instance had V4 and V6. V6 wasn't working properly in our outbound network-path. Disabling V6 solved it.

Upvotes: 0

GorginZ
GorginZ

Reputation: 131

Hi @nick https://stackoverflow.com/users/9405602/nick --> these are excellent suggestions writing a 'answer' because trouble shooting will be valuable for others plus char limit in comment.

The problem is definitely the policy.


sh-4.2$ cat /etc/yum/vars/awsregion
ap-southeast-2sh-4.2$

dig:


sh-4.2$ dig amazonlinux.ap-southeast-2.amazonaws.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> amazonlinux.ap-southeast-2.amazonaws.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 598 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;amazonlinux.ap-southeast-2.amazonaws.com. IN A

;; ANSWER SECTION: amazonlinux.ap-southeast-2.amazonaws.com. 278 IN CNAME s3.dualstack.ap-southeast-2.amazonaws.com. s3.dualstack.ap-southeast-2.amazonaws.com. 2 IN A 52.95.134.91

;; Query time: 4 msec ;; SERVER: 10.0.0.2#53(10.0.0.2) ;; WHEN: Mon Sep 20 00:03:36 UTC 2021 ;; MSG SIZE rcvd: 112


let's check in on the NACLs:

NACL OUTBOUND RULES description: 100 All traffic All All 0.0.0.0/0
Allow 101 All traffic All All 52.95.128.0/21
Allow 150 All traffic All All 3.5.164.0/22
Allow 200 All traffic All All 3.5.168.0/23
Allow 250 All traffic All All 3.26.88.0/28
Allow 300 All traffic All All 3.26.88.16/28
Allow All traffic All All 0.0.0.0/0
Deny

NACL INBOUND RULES inbound rule description: 100 All traffic All All 10.0.0.0/24 Allow 150 All traffic All All 10.0.1.0/24 Allow 200 All traffic All All 10.0.2.0/24 Allow 250 All traffic All All 10.0.3.0/24 Allow 400 All traffic All All 52.95.128.0/21
Allow 450 All traffic All All 3.5.164.0/22
Allow 500 All traffic All All 3.5.168.0/23
Allow 550 All traffic All All 3.26.88.0/28
Allow 600 All traffic All All 3.26.88.16/28
Allow All traffic All All 0.0.0.0/0
Deny

SO -----> '52.95.134.91' is captured by rule 101 outbound and 400 inbound so that looks good NACL wise. (future people trouble shooting, this is what you should look for)

Also regarding those CIDR blocks, Deploy script pulls those from the current list and grabs out the s3 ones for ap-southeast-2 with jq and pass those as parameters to the CF deploy.

docs on how to do that for others: https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html#aws-ip-download

Another note, you might notice the out 0.0.0.0/0, I realize (and for other people looking pls note )this makes the other rules redundant, I just put it in 'in case' while fiddling (and removed out -> pub subnets). private subnet traffic outbound 0.0.0.0/0 is routed to the respective NATs in public subnets. I'll add outbound for my public subnets and remove this rule at some point.

subnetting atm is simply: 10.0.0.0/16 pub a : 10.0.0.0/24 pub b : 10.0.1.0/24 priv a : 10.0.2.0/24 priv b : 10.0.3.0/24

so out rules for pub a and b blocks will be re-introduced so i can remove the allow on 0.0.0.0/0


I am now sure it is the policy.

I just click-ops amended the policy in console to 'full access' to give that a crack and had success.

My guess is the mirror list makes it hard to pin-down what to explicitly allow, so even though I cast the net broad I wasn't capturing the required bucket. But I don't know much about how aws mirrors work so that's a guess.

I probably don't want a super duper permissive policy, so this isn't really a fix but it confirms where the issue is.

Upvotes: 2

Related Questions