Silverlan
Silverlan

Reputation: 2911

Using GitHub cache action with multiple cache paths?

I'm trying to use the official GitHub cache action (https://github.com/actions/cache) to cache some binary files to speed up some of my workflows, however I've been unable to get it working when specifying multiple cache paths.

Here's a simple, working test I've set up using a single cache path: There is one action for writing the cache, and one for reading it (both executed in separate workflows, but on the same repository and branch). The write-action is executed first, and creates a file "subdir/a.txt", and then caches it with the "actions/cache@v2" action:

    # Test with single path
    - name: Create file
      shell: bash
      run: |
        mkdir subdir
        cd subdir
        printf '%s' "Lorem ipsum" >> a.txt
        
    - name: Write cache (Single path)
      uses: actions/cache@v2
      with:
        path: "D:/a/cache_test/cache_test/**/*.txt"
        key: test-cache-single-path

The read-action retrieves the cache, prints a list of all files in the directory recursively to confirm it has restored the file from the cache, and then prints the contents of the cached txt-file:

    - name: Get cached file
      uses: actions/cache@v2
      id: get-cache
      with:
        path: "D:/a/cache_test/cache_test/**/*.txt"
        key: test-cache-single-path
    
    - name: Print files
      shell: bash
      run: |
        echo "Cache hit: ${{steps.get-cache.outputs.cache-hit}}"
        cd "D:/a/cache_test/cache_test"
        ls -R
        cat "D:/a/cache_test/cache_test/subdir/a.txt"

This works without any issues.

Now, the description of the cache action contains an example for specifying multiple cache paths:

  - uses: actions/cache@v2
    with:
      path: | 
        path/to/dependencies
        some/other/dependencies 
      key: ${{ runner.os }}-${{ hashFiles('**/lockfiles') }}

But when I try that for my example actions, it fails to work. In the new write-action, I create two files, "subdir/a.txt" and "subdir/b.md", and then cache them by specifying two paths:

    # Test with multiple paths
    - name: Create files
      shell: bash
      run: |
        mkdir subdir
        cd subdir
        printf '%s' "Lorem ipsum" >> a.txt
        printf '%s' "dolor sit amet" >> b.md

    #- name: Write cache (Multi path)
      uses: actions/cache@v2
      with:
        path: |
          "D:/a/cache_test/cache_test/**/*.txt"
          "D:/a/cache_test/cache_test/**/*.md"
        key: test-cache-multi-path

The new read-action is the same as the old one, but also specifies both paths:

    # Read cache
    - name: Get cached file
      uses: actions/cache@v2
      id: get-cache
      with:
        path: |
          "D:/a/cache_test/cache_test/**/*.txt"
          "D:/a/cache_test/cache_test/**/*.md"
        key: test-cache-multi-path
    
    - name: Print files
      shell: bash
      run: |
        echo "Cache hit: ${{steps.get-cache.outputs.cache-hit}}"
        cd "D:/a/cache_test/cache_test"
        ls -R
        cat "D:/a/cache_test/cache_test/subdir/a.txt"
        cat "D:/a/cache_test/cache_test/subdir/b.md"

This time I still get the confirmation that the cache has been read:

Cache restored successfully
Cache restored from key: test-cache-multi-path
Cache hit: true

However "ls -R" does not list the files, and the "cat" commands fail because the files do not exist.

Where is my error? What is the proper way of specifying multiple paths with the cache action?

Upvotes: 16

Views: 14105

Answers (3)

Maciej Skorski
Maciej Skorski

Reputation: 3354

My use case was to cache both pip and APT packages, to compile documentation in Sphinx. While multiple paths worked out of the box, the non-trivial part was to link the apt cache to a user space. Here is the workflow:

name: docs

on: [push, pull_request, workflow_dispatch]

jobs:
  docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
      - name: cache dependencies
        id: cache_deps
        uses: actions/cache@v3
        env:
            cache-name: cache-dependencies
        with:
          path: |
            .cache/apt/archives
            .venv
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('.github/workflows/*') }}
      - name: Install dependencies fresh
        if: ${{ steps.cache_deps.outputs.cache-hit != 'true' }}
        run: |
          python -m venv .venv
          source .venv/bin/activate
          pip install jupyter-book
          pip install sphinxcontrib-plantuml
          echo "Dir::Cache \"$PWD/.cache/apt\";" | sudo tee -a /etc/apt/apt.conf
          sudo mkdir -p .cache/apt/archives/partial
          sudo apt-get -o Dir::Cache::archives=archives install plantuml
      - name: Install dependencies cache
        run: |
          sudo dpkg -i .cache/apt/archives/*.deb
          sudo chown -Rv $(whoami) .cache/apt
      - name: Compile Docs
        run: |
          source .venv/bin/activate
          jupyter-book build docs
      - name: Deploy to gh-pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_branch: gh-pages
          publish_dir: ./docs/_build/html

demonstrated to work here.

Upvotes: 0

codeaprendiz
codeaprendiz

Reputation: 3205

Came here to see if I can cache multiple binary files. I see there a separate workflow for pushing cache and another one for retrieving. We had a separate usecase where we need to install certain dependencies. Sharing the same here.

Usecase

  • You workflow needs gcc and python3 to run.(The dependencies can be any other as well)
  • You have a script to install dependencies ./install-dependencies.sh and you provide appropriate env to the script like ENV_INSTALL_PYTHON=true or ENV_INSTALL_GCC=true

Points to be noted

  • ./install-dependencies.sh takes care of installing the dependencies in the path ~/bin and produces the executable binaries in the same path. It also ensures that the $PATH environment variable is updated with the new binary paths
  • Instead of duplicating the check cache and install binaries 2 times (as we have 2 binaries now), we are able to do it in only one. So even if we have a requirement of installing 50 binaries, we can still do them in only two steps like this
  • The cache key name python-gcc-cache-key can be anything but ensure that it is unique.
  • The third step - name: install python, gcc takes care of creating the key with the name python-gcc-cache-key if it was not found, even though we have not mentioned this keyname anywhere in this step.
  • The first step is where you checkout your repository containing your ./install-dependencies.sh script.

Workflow

name: Install dependencies
on: [push]
jobs:
  install_dependencies:
    runs-on: ubuntu-latest
    name: Install python, gcc
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
## python, gcc installation
# Check if python, gcc if present in worker cache
      - name: python, gcc cache
        id: python-gcc-cache
        uses: actions/cache@v2
        with:
          path: |
            ~/bin/python
            ~/bin/gcc
          key: python-gcc-cache-key
 #  Install python, gcc if was not found in cache       
      - name: install python, gcc
        if: steps.python-gcc-cache.outputs.cache-hit != 'true'
        working-directory: .github/workflows
        env:
          ENV_INSTALL_PYTHON: true
          ENV_INSTALL_GCC: true
        run: |
          ./install-dependencies.sh

      - name: validate python, gcc
        working-directory: .github/workflows
        run: |
          ENV_INSTALL_BINARY_DIRECTORY_LINUX="$HOME/bin"
          export PATH="$ENV_INSTALL_BINARY_DIRECTORY_LINUX:$PATH"
          python3 --version
          gcc --version

Benefits

It will depend on what binaries you are trying to install. For us the saved time was nearly 50sec everytime there was cache hit.

Upvotes: 0

I was able to make it work with a few modifications;

  • use relative paths instead of absolute
  • use a hash of the content for the key

It looks like with at least bash the absolute paths look like this:

  • /d/a/so-foobar-cache/so-foobar-cache/cache_test/cache_test/subdir

Where so-foobar-cache is the name of the repository.

.github/workflows/foobar.yml


name: Store and Fetch cached files
on: [push]
jobs:
  store:
    runs-on: windows-2019
    steps:
      - name: Create files
        shell: bash
        id: store
        run: |
          mkdir -p 'cache_test/cache_test/subdir'
          cd 'cache_test/cache_test/subdir'
          echo pwd $(pwd)
          printf '%s' "Lorem ipsum" >> a.txt
          printf '%s' "dolor sit amet" >> b.md
          cat a.txt b.md
      - name: Store in cache
        uses: actions/cache@v2
        with:
          path: |
            cache_test/cache_test/**/*.txt
            cache_test/cache_test/**/*.md
          key: multiple-files-${{ hashFiles('cache_test/cache_test/**') }}
      - name: Print files (A)
        shell: bash
        run: |
          echo "Cache hit: ${{steps.store.outputs.cache-hit}}"
          find cache_test/cache_test/subdir
          cat cache_test/cache_test/subdir/a.txt
          cat cache_test/cache_test/subdir/b.md


  fetch:
    runs-on: windows-2019
    needs: store
    steps:
      - name: Restore
        uses: actions/cache@v2
        with:
          path: |
            cache_test/cache_test/**/*.txt
            cache_test/cache_test/**/*.md
          key: multiple-files-${{ hashFiles('cache_test/cache_test/**') }}
          restore-keys: |
            multiple-files-${{ hashFiles('cache_test/cache_test/**') }}
            multiple-files-
      - name: Print files (B)
        shell: bash
        run: |
          find cache_test -type f | xargs -t grep -e.

Log

$ gh run view 1446486801 

✓ master Store and Fetch cached files · 1446486801
Triggered via push about 3 minutes ago

JOBS
✓ store in 5s (ID 4171907768)
✓ fetch in 10s (ID 4171909690)

First job

$ gh run view 1446486801 --log --job=4171907768 | grep -e Create  -e Store -e Print
store   Create files    2021-11-10T22:59:32.1396931Z ##[group]Run mkdir -p 'cache_test/cache_test/subdir'
store   Create files    2021-11-10T22:59:32.1398025Z mkdir -p 'cache_test/cache_test/subdir'
store   Create files    2021-11-10T22:59:32.1398695Z cd 'cache_test/cache_test/subdir'
store   Create files    2021-11-10T22:59:32.1399360Z echo pwd $(pwd)
store   Create files    2021-11-10T22:59:32.1399936Z printf '%s' "Lorem ipsum" >> a.txt
store   Create files    2021-11-10T22:59:32.1400672Z printf '%s' "dolor sit amet" >> b.md
store   Create files    2021-11-10T22:59:32.1401231Z cat a.txt b.md
store   Create files    2021-11-10T22:59:32.1623649Z shell: C:\Program Files\Git\bin\bash.EXE --noprofile --norc -e -o pipefail {0}
store   Create files    2021-11-10T22:59:32.1626211Z ##[endgroup]
store   Create files    2021-11-10T22:59:32.9569082Z pwd /d/a/so-foobar-cache/so-foobar-cache/cache_test/cache_test/subdir
store   Create files    2021-11-10T22:59:32.9607728Z Lorem ipsumdolor sit amet
store   Store in cache  2021-11-10T22:59:33.9705422Z ##[group]Run actions/cache@v2
store   Store in cache  2021-11-10T22:59:33.9706196Z with:
store   Store in cache  2021-11-10T22:59:33.9706815Z   path: cache_test/cache_test/**/*.txt
store   Store in cache  cache_test/cache_test/**/*.md
store   Store in cache  
store   Store in cache  2021-11-10T22:59:33.9708499Z   key: multiple-files-25c0e6413e23766a3681413625169cee1ca3a7cd2186cc1b1df5370fb43bce55
store   Store in cache  2021-11-10T22:59:33.9709961Z ##[endgroup]
store   Store in cache  2021-11-10T22:59:35.1757943Z Received 260 of 260 (100.0%), 0.0 MBs/sec
store   Store in cache  2021-11-10T22:59:35.1761565Z Cache Size: ~0 MB (260 B)
store   Store in cache  2021-11-10T22:59:35.1781110Z [command]C:\Windows\System32\tar.exe -z -xf D:/a/_temp/653f7664-e139-4930-9710-e56942f9fa47/cache.tgz -P -C D:/a/so-foobar-cache/so-foobar-cache
store   Store in cache  2021-11-10T22:59:35.2069751Z Cache restored successfully
store   Store in cache  2021-11-10T22:59:35.2737840Z Cache restored from key: multiple-files-25c0e6413e23766a3681413625169cee1ca3a7cd2186cc1b1df5370fb43bce55
store   Print files (A) 2021-11-10T22:59:35.3087596Z ##[group]Run echo "Cache hit: "
store   Print files (A) 2021-11-10T22:59:35.3088324Z echo "Cache hit: "
store   Print files (A) 2021-11-10T22:59:35.3088983Z find cache_test/cache_test/subdir
store   Print files (A) 2021-11-10T22:59:35.3089571Z cat cache_test/cache_test/subdir/a.txt
store   Print files (A) 2021-11-10T22:59:35.3090176Z cat cache_test/cache_test/subdir/b.md
store   Print files (A) 2021-11-10T22:59:35.3104465Z shell: C:\Program Files\Git\bin\bash.EXE --noprofile --norc -e -o pipefail {0}
store   Print files (A) 2021-11-10T22:59:35.3106449Z ##[endgroup]
store   Print files (A) 2021-11-10T22:59:35.3494703Z Cache hit: 
store   Print files (A) 2021-11-10T22:59:35.4456032Z cache_test/cache_test/subdir
store   Print files (A) 2021-11-10T22:59:35.4456852Z cache_test/cache_test/subdir/a.txt
store   Print files (A) 2021-11-10T22:59:35.4459226Z cache_test/cache_test/subdir/b.md
store   Print files (A) 2021-11-10T22:59:35.4875011Z Lorem ipsumdolor sit amet
store   Post Store in cache 2021-11-10T22:59:35.6109511Z Post job cleanup.
store   Post Store in cache 2021-11-10T22:59:35.7899690Z Cache hit occurred on the primary key multiple-files-25c0e6413e23766a3681413625169cee1ca3a7cd2186cc1b1df5370fb43bce55, not saving cache.

Second job

$ gh run view 1446486801 --log --job=4171909690  | grep -e Restore -e Print
fetch   Restore 2021-11-10T22:59:50.8498516Z ##[group]Run actions/cache@v2
fetch   Restore 2021-11-10T22:59:50.8499346Z with:
fetch   Restore 2021-11-10T22:59:50.8499883Z   path: cache_test/cache_test/**/*.txt
fetch   Restore cache_test/cache_test/**/*.md
fetch   Restore 
fetch   Restore 2021-11-10T22:59:50.8500449Z   key: multiple-files-
fetch   Restore 2021-11-10T22:59:50.8501079Z   restore-keys: multiple-files-
fetch   Restore multiple-files-
fetch   Restore 
fetch   Restore 2021-11-10T22:59:50.8501644Z ##[endgroup]
fetch   Restore 2021-11-10T22:59:53.1143793Z Received 257 of 257 (100.0%), 0.0 MBs/sec
fetch   Restore 2021-11-10T22:59:53.1145450Z Cache Size: ~0 MB (257 B)
fetch   Restore 2021-11-10T22:59:53.1163664Z [command]C:\Windows\System32\tar.exe -z -xf D:/a/_temp/30b0dc24-b25f-4713-b3d3-cecee7116785/cache.tgz -P -C D:/a/so-foobar-cache/so-foobar-cache
fetch   Restore 2021-11-10T22:59:53.1784328Z Cache restored successfully
fetch   Restore 2021-11-10T22:59:53.5197756Z Cache restored from key: multiple-files-
fetch   Print files (B) 2021-11-10T22:59:53.5483939Z ##[group]Run find cache_test -type f | xargs -t grep -e.
fetch   Print files (B) 2021-11-10T22:59:53.5484730Z find cache_test -type f | xargs -t grep -e.
fetch   Print files (B) 2021-11-10T22:59:53.5498140Z shell: C:\Program Files\Git\bin\bash.EXE --noprofile --norc -e -o pipefail {0}
fetch   Print files (B) 2021-11-10T22:59:53.5498674Z ##[endgroup]
fetch   Print files (B) 2021-11-10T22:59:55.8119800Z grep -e. cache_test/cache_test/subdir/a.txt cache_test/cache_test/subdir/b.md
fetch   Print files (B) 2021-11-10T22:59:56.1777887Z cache_test/cache_test/subdir/a.txt:Lorem ipsum
fetch   Print files (B) 2021-11-10T22:59:56.1784138Z cache_test/cache_test/subdir/b.md:dolor sit amet
fetch   Post Restore    2021-11-10T22:59:56.3890391Z Post job cleanup.
fetch   Post Restore    2021-11-10T22:59:56.5481739Z Cache hit occurred on the primary key multiple-files-, not saving cache.

Upvotes: 12

Related Questions