gary
gary

Reputation: 695

environment variables applied during elastic beanstalk deploy

My basic question: How would I set an environment variable that will be in effect during the Elastic Beanstalk deploy process?

I am not talking about setting environment variables during deployment that will be accessible by my application after it is deployed, I want to set environment variables that will modify a specific behavior of Elastic Beanstalk's build scripts.

To be clear - I generally think this is a bad idea, but it might be OK in this case so I am trying this out as an experiment. Here is some background about why I am looking into this, and why I think it might be OK:

I am in the process of transferring a server from AWS in the US to AWS in China, and am finding that server deploys fail between 50% ~ 100% of the time, depending on the day. This is a major pain during development, but I am primarily concerned about how I am going to make this work in production.

This is an Amazon Linux server running Python 2.7, and logs indicate that the failures are mainly Read Timeout Errors, with a few Connection Reset by Peers thrown in once in a while, all generated by pip install while attempting to download packages from pypi. To verify this I have ssh'd into my instances to manually install a few packages, and on a small sample size see similar failure rates. Note that this is pretty common when trying to access content on the other side of China's GFW.

So, I wrote a script that pip downloads the packages to my local machine, then aws syncs them to an S3 bucket located in the same region as my server. This would eliminate the need to cross the GFW while deploying.

My original plan was to add an .ebextension that aws cps the packages from S3 to the pip cache, but (unless I missed something) this somewhat surprisingly doesn't appear to be straight forward.

So, as plan B I am redirecting the packages into a local directory on the instance. This is working well, but I can't get pip install to pull packages from the local directory rather than downloading the packages from pypi.

Following the pip documentation, I expected that pointing the PIP_FIND_LINKS environment variable to my package directory would have pip "naturally" pull packages from my directory, rather than pypi. Which would make the change transparent to the EB build scripts, and why I thought that this might be a reasonable solution.

So far I have tried:

1) a command which exports PIP_FIND_LINKS=/path/to/package, with no luck. I assumed that this was due to the deploy step being called from a different session, so I then tried:

2) a command which (in addition to the previous export) appends export PIP_FIND_LINKS=/path/to/package to ~./profile, in an attempt to have this apply to any new sessions.

I have tried issuing the commands by both ec2_user and root, and neither works.

Rather than keep poking a stick at this, I was hoping that someone with a bit more experience with the nuances of EB, pip, etc might be able to provide some guidance.

Upvotes: 2

Views: 955

Answers (1)

gary
gary

Reputation: 695

After some thought I decided that a pip config file should be a more reliable solution than environment variables.

This turned out to be easy to implement with .ebextensions. I first create the download script, then create the config file directly in the virtualenv folder:

files:

  /home/ec2-user/download_packages.sh:
    mode: "000500"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      package_dir=/path/to/packages

      mkdir -p $package_dir
      aws s3 sync s3://bucket/packages $package_dir

  /opt/python/run/venv/pip.conf:
    mode: "000755"
    owner: root
    group: root
    content: |
      [install]
      find-links = file:///path/to/packages
      no-index=false

Finally, a command is used to call the script that we just created:

commands:

  03_download_packages:
    command: bash /home/ec2-user/download_packages.sh

One potential issue is that pip bypasses the local package directory and downloads packages that are stored in our private git repo, so there is still potential for timeout errors, but these represent just a small fraction of the packages that need to be installed so it should be workable.

Still unsure if this will be a long-term solution, but it is very simple and (after just one day of testing...) failure rates have fallen from 50% ~ 100% to 0%.

Upvotes: 1

Related Questions