Saifeddine Rajhi
Saifeddine Rajhi

Reputation: 202

Concurrent terraform Installation Issue using TFENV with TFENV_AUTO_INSTALL in Terragrunt Environments repository

Issue:

When using TFENV_AUTO_INSTALL environment variable in a Terragrunt repository, concurrent installations of the many different Terraform versions trigger a race condition.

This results in an error where tfenv attempts to install many versions of Terraform concurrently in parallel pipeline jobs, leading to permission denied issues.

My code repo:

dev-account01
├── eu-west-1
│   ├── iam_roles
│   │    ├──  .terraform-version
│   │    ├──  main.tf
│   ├── networking
│   │    ├── .terraform-version
│   │    ├── main.tf

For each module a different terrform version 1.6.2 and 1.5.5

PS: in my actual setup I have many more regions and more modules and more accounts.

Error Message:

/home/user/.tfenv/lib/tfenv-exec.sh: line 43:  /home/user/.tfenv/versions/1.6.2/terraform: Permission denied
/home/user/.tfenv/lib/tfenv-exec.sh: line 43: exec: /home/user/.tfenv/versions/1.6.2/terraform: cannot execute: Permission denied

Reproducible Scenario:

  1. Enable TFENV_AUTO_INSTALL in a Terragrunt repo.
  2. Trigger pipeline with multiple jobs/plans that attempt to install many versions of Terraform not previously used.

Expected Behavior:

TFENV_AUTO_INSTALL should handle concurrent installations gracefully or sequentially, avoiding race conditions and permission denied errors.

Or is there any way to serialize the installations of the different terraform versions present in my terraform module in each account?

EDIT:

example of solution:

#!/bin/bash

LOCK_FILE="/tmp/tfenv-wrapper.lock"
MAX_CONCURRENT_PROCESSES=1

# Function to acquire a lock
function acquire_lock() {
  while true; do
    exec 202>"$LOCK_FILE"
    flock -n 202 && break
    echo "Another instance of the script is already running. Waiting for it to complete."
    sleep 5
  done
}

# Function to release the lock
function release_lock() {
  flock -u 202
  rm -f "$LOCK_FILE"
}

# Function to check the number of running processes matching the pattern
function check_tfenv_processes() {
  pgrep -f "tfenv install" | grep -v $$ | wc -l
}

# Infinite loop to keep the script running
while true; do
  # Acquire the lock
  acquire_lock

  # Check the number of running processes
  num_processes=$(check_tfenv_processes)

  # If the number of running processes exceeds the limit, wait
  while [ "$num_processes" -ge "$MAX_CONCURRENT_PROCESSES" ]; do
    echo "Maximum number of concurrent 'tfenv install' processes reached. Waiting for processes to complete."
    sleep 5
    num_processes=$(check_tfenv_processes)
  done

  # Your script logic goes here

  # Simulate some work
  echo "Script is running..."

  # Release the lock
  release_lock
done

Current workspaces:

atlantis-git-test-0:/$ ls -l /atlantis-data/repos/orga/infra-test/4
total 24
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 default
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 environments_eks-dev-1_09_eks
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 environments_eks-dev-1_11_r53_zones
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 environments_eks-dev-1_13_irsa
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 environments_eks-dev-1_15_vault
drwx--S---    5 atlantis atlantis      4096 Jan  8 10:00 environments_eks-staging-1_11_r53_zones

Upvotes: 1

Views: 517

Answers (0)

Related Questions