Reputation: 1195
I use the following Github Actions workflow for my C project. The workflow finishes in ~40 seconds, but more than half of that time is spent by installing the valgrind
package and its dependencies.
I believe caching could help me speed up the workflow. I do not mind waiting a couple of extra seconds, but this just seems like a pointless waste of GitHub's resources.
name: C Workflow
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: make
run: make
- name: valgrind
run: |
sudo apt-get install -y valgrind
valgrind -v --leak-check=full --show-leak-kinds=all ./bin
Running sudo apt-get install -y valgrind
installs the following packages:
gdb
gdbserver
libbabeltrace1
libc6-dbg
libipt1
valgrind
I know Actions support caching of a specific directory (and there are already several answered SO questions and articles about this), but I am not sure where all the different packages installed by apt end up. I assume /bin/
or /usr/bin/
are not the only directories affected by installing packages.
Is there an elegant way to cache the installed system packages for future workflow runs?
Upvotes: 68
Views: 26841
Reputation: 1
Using a service it should be possible to use an apt-cacher-ng container to cache apt. Then you'd just have to setup apt-get to use the local proxy via this service container; the apt-cacher-ng docs has a how-to setup an apt proxy. The final step would be to cache the apt-cacher-ng cache via the github cache action.
If I ever get around to testing this, I will update this answer.
Locally I have an apt-cacher-ng container setup and the cache is in ~/.dockercache/apt-cacher-ng
. So I do believe the theory is sound.
Upvotes: 0
Reputation: 2727
Just for instance, there is already exists several implementations:
https://github.com/awalsh128/cache-apt-pkgs-action
apt-fast
from https://git.io/vokNn instead of direct usage the apt-get
(https://askubuntu.com/questions/52243/what-is-apt-fast-and-should-i-use-it)dpkg -L
to enlist changes${cache_dir}/${installed_package}.tar
(without compression).action/cache
does compression:https://github.com/airvzxf/cache-anything-new-action
Caching APT packages in GitHub Actions workflow
dpkg -L
, but finds all the changes in the file systemhttps://github.com/Mudlet/xmlstarlet-action
xmlstarlet
with argumentsDockerfile
and entrypoint.sh
, can not use external script or instruction setapt-get install
, but can be faster for multiple packagesUpvotes: 10
Reputation: 43068
The purpose of this answer is to show how caching can be done with github actions, not necessarily to show how to cache valgrind
, (which it does). I also try to explain why not everything can/should be cached, because the cost (in terms of time) of caching and restoring a cache, vs reinstalling the dependency needs to be taken into account.
You will make use of the actions/cache
action to do this.
Add it as a step (before you need to use valgrind):
- name: Cache valgrind
uses: actions/cache@v2
id: cache-valgrind
with:
path: "~/valgrind"
key: ${{secrets.VALGRIND_VERSION}}
The next step should attempt to install the cached version if any or install from the repositories:
- name: Install valgrind
env:
CACHE_HIT: ${{steps.cache-valgrind.outputs.cache-hit}}
VALGRIND_VERSION: ${{secrets.VALGRIND_VERSION}}
run: |
if [[ "$CACHE_HIT" == 'true' ]]; then
sudo cp --verbose --force --recursive ~/valgrind/* /
else
sudo apt-get install --yes valgrind="$VALGRIND_VERSION"
mkdir -p ~/valgrind
sudo dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/
fi
Set VALGRIND_VERSION
secret to be the output of:
apt-cache policy valgrind | grep -oP '(?<=Candidate:\s)(.+)'
this will allow you to invalidate the cache when a new version is released simply by changing the value of the secret.
dpkg -L valgrind
is used to list all the files installed when using sudo apt-get install valgrind
.
What we can now do with this command is to copy all the dependencies to our cache folder:
dpkg -L valgrind | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/
In addition to copying all the components of valgrind
, it may also be necessary to copy the dependencies (such as libc
in this case), but I don't recommend continuing along this path because the dependency chain just grows from there. To be precise, the dependencies needed to copy to finally have an environment suitable for valgrind to run in is as follows:
To copy all these dependencies, you can use the same syntax as above:
for dep in libc6 libgcc1 gcc-8-base; do
dpkg -L $dep | while IFS= read -r f; do if test -f $f; then echo $f; fi; done | xargs cp --parents --target-directory ~/valgrind/
done
Is all this work really worth the trouble when all that is required to install valgrind
in the first place is to simply run sudo apt-get install valgrind
? If your goal is to speed up the build process, then you also have to take into consideration the amount of time it is taking to restore (downloading, and extracting) the cache vs simply running the command again to install valgrind
.
And finally to restore the cache, assuming it is stored at /tmp/valgrind
, you can use the command:
cp --force --recursive /tmp/valgrind/* /
Which will basically copy all the files from the cache unto the root partition.
In addition to the process above, I also have an example of "caching valgrind" by installing and compiling it from source. The cache is now about 63MB (compressed) in size and one still needs to separately install libc
which kind of defeats the purpose.
Note: Another answer to this question proposes what I could consider to be a safer approach to caching dependencies, by using a container which comes with the dependencies pre-installed. The best part is that you can use actions to keep those containers up-to-date.
References:
Upvotes: 52
Reputation: 415
Updated: I created a GitHub action which work as this solution, less code and better optimizations. Cache Anything New
This solution is similar to the most voted. I tried the proposed solution but it didn't work for me because I was installing texlive-latex
, and pandoc
which has many dependencies and sub-dependencies.
I created a solution which should help many people. One case is when you install a couple of packages (apt install
), the other solution is when you make
a program and it takes for a while.
Solution:
find
to create a list of all the files in the container.make
the programs, whatever that you want to cache.find
to create a list of all the files in the container.diff
to get the new created files.actions/cache@v2
./
.When to use this?
Implementation:
Source code: .github/workflows
Landing page of my actions: workflows.
release.yml
name: CI - Release books
on:
release:
types: [ released ]
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- uses: actions/cache@v2
id: cache-packages
with:
path: ${{ runner.temp }}/cache-linux
key: ${{ runner.os }}-cache-packages-v2.1
- name: Install packages
if: steps.cache-packages.outputs.cache-hit != 'true'
env:
SOURCE: ${{ runner.temp }}/cache-linux
run: |
set +xv
echo "# --------------------------------------------------------"
echo "# Action environment variables"
echo "github.workspace: ${{ github.workspace }}"
echo "runner.workspace: ${{ runner.workspace }}"
echo "runner.os: ${{ runner.os }}"
echo "runner.temp: ${{ runner.temp }}"
echo "# --------------------------------------------------------"
echo "# Where am I?"
pwd
echo "SOURCE: ${SOURCE}"
ls -lha /
sudo du -h -d 1 / 2> /dev/null || true
echo "# --------------------------------------------------------"
echo "# APT update"
sudo apt update
echo "# --------------------------------------------------------"
echo "# Set up snapshot"
mkdir -p "${{ runner.temp }}"/snapshots/
echo "# --------------------------------------------------------"
echo "# Install tools"
sudo rm -f /var/lib/apt/lists/lock
#sudo apt install -y vim bash-completion
echo "# --------------------------------------------------------"
echo "# Take first snapshot"
sudo find / \
-type f,l \
-not \( -path "/sys*" -prune \) \
-not \( -path "/proc*" -prune \) \
-not \( -path "/mnt*" -prune \) \
-not \( -path "/dev*" -prune \) \
-not \( -path "/run*" -prune \) \
-not \( -path "/etc/mtab*" -prune \) \
-not \( -path "/var/cache/apt/archives*" -prune \) \
-not \( -path "/tmp*" -prune \) \
-not \( -path "/var/tmp*" -prune \) \
-not \( -path "/var/backups*" \) \
-not \( -path "/boot*" -prune \) \
-not \( -path "/vmlinuz*" -prune \) \
> "${{ runner.temp }}"/snapshots/snapshot_01.txt 2> /dev/null \
|| true
echo "# --------------------------------------------------------"
echo "# Install pandoc and dependencies"
sudo apt install -y texlive-latex-extra wget
wget -q https://github.com/jgm/pandoc/releases/download/2.11.2/pandoc-2.11.2-1-amd64.deb
sudo dpkg -i pandoc-2.11.2-1-amd64.deb
rm -f pandoc-2.11.2-1-amd64.deb
echo "# --------------------------------------------------------"
echo "# Take second snapshot"
sudo find / \
-type f,l \
-not \( -path "/sys*" -prune \) \
-not \( -path "/proc*" -prune \) \
-not \( -path "/mnt*" -prune \) \
-not \( -path "/dev*" -prune \) \
-not \( -path "/run*" -prune \) \
-not \( -path "/etc/mtab*" -prune \) \
-not \( -path "/var/cache/apt/archives*" -prune \) \
-not \( -path "/tmp*" -prune \) \
-not \( -path "/var/tmp*" -prune \) \
-not \( -path "/var/backups*" \) \
-not \( -path "/boot*" -prune \) \
-not \( -path "/vmlinuz*" -prune \) \
> "${{ runner.temp }}"/snapshots/snapshot_02.txt 2> /dev/null \
|| true
echo "# --------------------------------------------------------"
echo "# Filter new files"
diff -C 1 \
--color=always \
"${{ runner.temp }}"/snapshots/snapshot_01.txt \
"${{ runner.temp }}"/snapshots/snapshot_02.txt \
| grep -E "^\+" \
| sed -E s/..// \
> "${{ runner.temp }}"/snapshots/snapshot_new_files.txt
< "${{ runner.temp }}"/snapshots/snapshot_new_files.txt wc -l
ls -lha "${{ runner.temp }}"/snapshots/
echo "# --------------------------------------------------------"
echo "# Make cache directory"
rm -fR "${SOURCE}"
mkdir -p "${SOURCE}"
while IFS= read -r LINE
do
sudo cp -a --parent "${LINE}" "${SOURCE}"
done < "${{ runner.temp }}"/snapshots/snapshot_new_files.txt
ls -lha "${SOURCE}"
echo ""
sudo du -sh "${SOURCE}" || true
echo "# --------------------------------------------------------"
- name: Copy cached packages
if: steps.cache-packages.outputs.cache-hit == 'true'
env:
SOURCE: ${{ runner.temp }}/cache-linux
run: |
echo "# --------------------------------------------------------"
echo "# Using Cached packages"
ls -lha "${SOURCE}"
sudo cp --force --recursive "${SOURCE}"/. /
echo "# --------------------------------------------------------"
- name: Generate release files and commit in GitHub
run: |
echo "# --------------------------------------------------------"
echo "# Generating release files"
git fetch --all
git pull --rebase origin main
git checkout main
cd ./src/programming-from-the-ground-up
./make.sh
cd ../../
ls -lha release/
git config --global user.name 'Israel Roldan'
git config --global user.email '[email protected]'
git add .
git status
git commit -m "Automated Release."
git push
git status
echo "# --------------------------------------------------------"
Explaining some pieces of the code:
Here the action cache, indicate a key
which will be generated once and compare in later executions. The path
is the directory where the files should be to generate the cache compressed file.
- uses: actions/cache@v2
id: cache-packages
with:
path: ${{ runner.temp }}/cache-linux
key: ${{ runner.os }}-cache-packages-v2.1
This conditional search for the key
cache, if it exits the cache-hit
is 'true'.
if: steps.cache-packages.outputs.cache-hit != 'true'
if: steps.cache-packages.outputs.cache-hit == 'true'
It's not critical but when the du
command executes at first time, Linux indexed all the files (5~8 minutes), then when we will use the find
, it will take only ~50 seconds to get all the files. You can delete this line, if you want.
The suffixed command || true
prevents that 2> /dev/null
return error otherwise the action will stop because it will detect that your script has an error output. You will see during the script a couple of theses.
sudo du -h -d 1 / 2> /dev/null || true
This is the magical part, use find
to generate a list of the actual files, excluding some directories to optimize the cache folder. It also will be executed after the installations and make
programs. In the next snapshot the file name should be different snapshot_02.txt
.
sudo find / \
-type f,l \
-not \( -path "/sys*" -prune \) \
-not \( -path "/proc*" -prune \) \
-not \( -path "/mnt*" -prune \) \
-not \( -path "/dev*" -prune \) \
-not \( -path "/run*" -prune \) \
-not \( -path "/etc/mtab*" -prune \) \
-not \( -path "/var/cache/apt/archives*" -prune \) \
-not \( -path "/tmp*" -prune \) \
-not \( -path "/var/tmp*" -prune \) \
-not \( -path "/var/backups*" \) \
-not \( -path "/boot*" -prune \) \
-not \( -path "/vmlinuz*" -prune \) \
> "${{ runner.temp }}"/snapshots/snapshot_01.txt 2> /dev/null \
|| true
Install some packages and pandoc
.
sudo apt install -y texlive-latex-extra wget
wget -q https://github.com/jgm/pandoc/releases/download/2.11.2/pandoc-2.11.2-1-amd64.deb
sudo dpkg -i pandoc-2.11.2-1-amd64.deb
rm -f pandoc-2.11.2-1-amd64.deb
Generate the text file with the new files added, the files could be symbolic files, too.
diff -C 1 \
"${{ runner.temp }}"/snapshots/snapshot_01.txt \
"${{ runner.temp }}"/snapshots/snapshot_02.txt \
| grep -E "^\+" \
| sed -E s/..// \
> "${{ runner.temp }}"/snapshots/snapshot_new_files.txt
At the end copy all the files into the cache directory as an archive to keep the original information.
while IFS= read -r LINE
do
sudo cp -a --parent "${LINE}" "${SOURCE}"
done < "${{ runner.temp }}"/snapshots/snapshot_new_files.txt
Step to copy all the cached files into the main path /
.
- name: Copy cached packages
if: steps.cache-packages.outputs.cache-hit == 'true'
env:
SOURCE: ${{ runner.temp }}/cache-linux
run: |
echo "# --------------------------------------------------------"
echo "# Using Cached packages"
ls -lha "${SOURCE}"
sudo cp --force --recursive "${SOURCE}"/. /
echo "# --------------------------------------------------------"
This step is where I'm using the installed packages generated by the cache, the ./make.sh
script use pandoc
to do some conversions. As I mentioned, you can create other steps which use the cache benefits or another which not use the cache.
- name: Generate release files and commit in GitHub
run: |
echo "# --------------------------------------------------------"
echo "# Generating release files"
cd ./src/programming-from-the-ground-up
./make.sh
Upvotes: 10
Reputation: 5258
You could create a docker image with valgrind
preinstalled and run your workflow on that.
Create a Dockerfile
with something like:
FROM ubuntu
RUN apt-get install -y valgrind
Build it and push it to dockerhub:
docker build -t natiiix/valgrind .
docker push natiiix/valgrind
Then use something like the following as your workflow:
name: C Workflow
on: [push, pull_request]
jobs:
build:
container: natiiix/valgrind
steps:
- uses: actions/checkout@v1
- name: make
run: make
- name: valgrind
run: valgrind -v --leak-check=full --show-leak-kinds=all ./bin
Completely untested, but you get the idea.
Upvotes: 45