GaTechThomas
GaTechThomas

Reputation: 6114

Optimize retrieval of multiple git-sourced terraform modules from the same repo

We have terraform code in a git repo that references custom modules in another private repo:

module "myModule" {
  source = "git::https://mygiturl//some/module"

  bla bla bla...
}

When we reference multiple modules that live in the same git repo, terraform init will go and clone the same git repo repeatedly for every module reference. In the end, it takes minutes to do something that would take seconds if the same repo were not cloned repeatedly into different folders.

What options do we have for optimizing the module retrieval for speed?

Upvotes: 0

Views: 2129

Answers (1)

Martin Atkins
Martin Atkins

Reputation: 74479

The terraform init command does include an optimization where it tries to recognize if a module has a module package address that matches a module that was already installed, and if so it will try to copy the existing content already cached on local disk rather than retrieving the content over the network a second time.

In order for that to work though, all of the modules must have the same package address. The "package address" is the part of the address which tells Terraform what "package" (repository, archive) it should download, as opposed to which directory inside that package it should look in to find the module's .tf files.

If you are specifying particular subdirectories inside a single repository then you are presumably already using the Modules in Package Sub-directories syntax where the package name is separated from the subdirectory path using a pair of slashes //, giving a source address like this:

module "example" {
  source = "git::https://example.com/foo/bar.git//path/to/directory"
}

In the above, the package address is git::https://example.com/foo/bar.git and the subdirectory path is path/to/directory. It's the package address portion that needs to match across multiple module calls in order for Terraform to detect this opportunity for optimization.


Another option, if your goal is to have everything in a single repository anyway, is to use only relative paths starting with ../ and ./ in your module source addresses.

When you specify a local path, Terraform understands it as referring to another directory within the same module package as the caller, and so Terraform doesn't need to download anything else or create any local copies in order to create a unique directory for that call.

This approach does assume that you want to have everything in a single repository. If you have a hybrid approach where some modules are isolated into separate repositories but others are kept together in a large repository then that is a design pattern that Terraform's module installer is not designed to support well.


If the installer optimization isn't sufficient and you cannot use a single repository for everything then the only remaining option would be to split your modules across multiple smaller packages. A Git repository is one example of a "package", but you can also potentially add a level of indirection by adding a CI process to your repository which packages up the modules into separate packages and publishes those packages somewhere else that Terraform can install from, such as .zip files in an Amazon S3 bucket.

Terraform does not offer a way to share the same local directory between multiple module packages because modules are sometimes written in a way that causes them to modify their own source directory during execution (not a recommended pattern, but still possible) and in that case the module is likely to misbehave if multiple instances of it are trying to work in the same directory.

Upvotes: 2

Related Questions