Reputation: 16731
I have a private repository at gitlab.com that uses the CI feature. Some of the CI jobs create artifacts files that are stored. I just implemented that the artifacts are deleted automatically after one day by adding this to the CI configuration:
expire_in: 1 day
That works great - however, old artifacts won't be deleted (as expected). So my question is:
How can I delete old artifacts or artifacts that do not expire? (on gitlab.com, no direct access to the server)
Upvotes: 40
Views: 66208
Reputation: 171
I did this script to remove artifacts with graphql api
import { GraphQLClient } from 'graphql-request'
function graphQLClient(jwt) {
return new GraphQLClient("https://gitlab.com/api/graphql", {
headers: {
authorization: `Bearer ${jwt}`,
},
})
}
const projectPath = "GROUPE/PROJECT"
//Go to: https://gitlab.com/profile/personal_access_tokens
// GITLAB PERSONAL API TOKEN
const GRAPHQL_TOKEN="REPLACE_ME"
const PROJECT_ID = REPLACE_ME; // a number
async function removeArtifacts(pageSize = 50, cursor = '') {
const jobArtifactsQuery = `
query getJobArtifacts(
$projectPath: ID!
$firstPageSize: Int
$lastPageSize: Int
$prevPageCursor: String = ""
$nextPageCursor: String = ""
) {
project(fullPath: $projectPath) {
id
__typename
jobs(
withArtifacts: true
first: $firstPageSize
last: $lastPageSize
after: $nextPageCursor
before: $prevPageCursor
) {
nodes {
artifacts {
nodes {
id
expireAt
__typename
}
__typename
}
__typename
}
pageInfo {
...PageInfo
__typename
}
__typename
}
__typename
}
}
fragment PageInfo on PageInfo {
hasNextPage
hasPreviousPage
startCursor
endCursor
__typename
}
`
const variables = {
projectPath: projectPath,
firstPageSize: pageSize,
lastPageSize: null,
nextPageCursor: cursor,
prePageCursor: ""
}
let artifactsIds = []
console.log("Fetching artifacts...");
const pageInfo = graphQLClient(GRAPHQL_TOKEN).request(jobArtifactsQuery, variables).then(list => {
const artifacts = list.project.jobs.nodes.map(e => e.artifacts.nodes.map(f => f.id))
artifactsIds = artifacts.flat().map(item => `"${item}"`);
console.log(artifactsIds)
if(artifactsIds.length === 0) {
console.log("No artifacts found");
return list.project.jobs.pageInfo
}
const bulkDestroyJobArtifacts = `
mutation {
bulkDestroyJobArtifacts(input:{
projectId: "gid://gitlab/Project/REPLACE_ME",
ids: [${artifactsIds}]
}) {
destroyedCount
destroyedIds
errors
}
}
`
console.log("Deleting artifacts...");
graphQLClient(GRAPHQL_TOKEN).request(bulkDestroyJobArtifacts).then(r => console.log(r.bulkDestroyJobArtifacts ))
const queryArtifactsSize = `
query getBuildArtifactsSize($projectPath: ID!) { project(fullPath: $projectPath) { id statistics { buildArtifactsSize __typename } __typename }}
`
graphQLClient(GRAPHQL_TOKEN).request(queryArtifactsSize, {projectPath: projectPath}).then(r => console.log(r.project.statistics.buildArtifactsSize))
return list.project.jobs.pageInfo
})
console.log(artifactsIds);
return pageInfo;
}
function main (cursor = "") {
const pageInfo = removeArtifacts(99, cursor);
pageInfo.then(p => {
if(!p.hasNextPage) {
console.log("Done!");
return;
}
main(p.endCursor);
})
}
const timeTaken = "Time taken by removeArtifacts function";
console.time(timeTaken);
main()
console.timeEnd(timeTaken);
Upvotes: 1
Reputation: 1325427
An API call should be easier to script, with GitLab 14.7 (January 2022), which now offers:
Bulk delete artifacts with the API
While a good strategy for managing storage consumption is to set regular expiration policies for artifacts, sometimes you need to reduce items in storage right away.
Previously, you might have used a script to automate the tedious task of deleting artifacts one by one with API calls, but now you can use a new API endpoint to bulk delete job artifacts quickly and easily.
See Documentation, Issue 223793 and Merge Request 75488.
curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
"https://gitlab.example.com/api/v4/projects/1/artifacts"
As noted by Lubo in the comments:
Response of given API is 202 Accepted. It means for me, that deletion will happen on background.
Also admin area ís updated a bit later than deletion happens
As noted by Lorenz Leitner in the comments, the bulk delete API endpoint (introduced in GitLab 14.7) will not necessarily remove artifacts that are protected by the "Keep the latest artifacts for all jobs in the latest successful pipelines
" setting.
If you want the bulk delete API to work on these locked artifacts, you need to disable this setting at the project or instance level (depending on configuration).
Disabling it unlocks them for deletion, but it will not immediately remove them (application settings have a cache expiry). A new pipeline needs to run before the locked artifacts become eligible for deletion.
Upvotes: 30
Reputation: 581
If you have deleted all the jobs by accident (thinking the artifacts would be gone, but they didn't) what would be the alternative then brute-forcing a loop range?
I have this code, which does bruteforce on a range of numbers. But since I use the gitlab.com public runners, It's a long-range
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="xxxxxx" #
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="yyyyy"
server="gitlab.com"
# Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.
# https://stackoverflow.com/questions/52609966/for-loop-over-sequence-of-large-numbers-in-bash
for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
echo "$job_id"
echo Job ID being deleted is "$job_id"
curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
echo -en '\n'
echo -en '\n'
done
Upvotes: 2
Reputation: 1888
None of the API based solutions worked for me because the DELETE API only sets the expiry date on the build. Then it is upto sidekiq to perform the deletion. If there is a bug in gitlab that ignores the expiry date on artifacts even if it is set, then nothing will happen.
Builds can also be removed manually with rails runner. The below script cleans up both artifacts as well as job logs that are older than 1 month. Note that it only does so for the top 20 projects that are consuming diskpace
#!/usr/bin/env ruby
# This is a ruby script to delete build artifacts from gitlab that are older than 1 month
# Copy this file to /tmp/ folder on gitlab server then execute rails runner as below
# gitlab-rails runner /tmp/cleanupArtifacts.rb
include ActionView::Helpers::NumberHelper
ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
builds_artifacts = s.project.builds.with_downloadable_artifacts
counter=1
builds_artifacts.find_each do |build|
counter=counter+1
puts "Build #{build.id} \t created at #{build.created_at}"
if build.created_at < 1.month.ago
puts "Build #{build.id} marked for deletion"
build.destroy!
end
end
puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path} \t Builds: #{counter}"
end
Upvotes: 0
Reputation: 19168
Although the answers here are pretty nicely summarised, I am just adding on the Python script used by me for manually cleaning only the artefacts on the latest version of GitLab linux installation (v16.10.3-ee).
I first retrieved the number of pages and projects I have in our GitLab environment, using the values "x-total-pages" and "x-total" of the command curl https://gitlab.company/api/v4/projects?private_token=<token> --head
. Then, I iterated through the GitLab projects in the paginated manner, and retrieved the project-IDs into a list. Lastly, I iterated through this list to perform the necessary artefact cleanup.
# This is a sample Python script referenced on the idea from https://stackoverflow.com/a/70817349/3482140
import json
import requests
def clean_gitlab_artefact():
base_url = "https://gitlab.company"
access_token = "access-token" # check with your GitLab project owner
print(f'GET /version')
x = (requests.get(f"{base_url}/api/v4/version", headers={"PRIVATE-TOKEN": access_token}))
print(x)
data = json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Implemented on 16.10.3-ee!')
# # there were 173 projects at the time of running this script, which can be checked by exploring the
# # value "x-total" of the command `curl https://gitlab.company/api/v4/projects?private_token=<token> --head`
page = 1
total_project_ids = []
while page != 3:
print(f'GET /project-IDs')
projects = (requests.get(f"{base_url}/api/v4/projects?per_page=100&page={page}",
headers={"PRIVATE-TOKEN": access_token}))
page += 1
# print(projects)
data = json.loads(projects.text)
project_ids = [o["id"] for o in data]
total_project_ids += project_ids
print(total_project_ids)
for project_id in total_project_ids:
request_str = f'projects/{project_id}/artifacts'
url = f'{base_url}/api/v4/{request_str}'
print(f'DELETE /{request_str}')
x = (requests.delete(url, headers={"PRIVATE-TOKEN": access_token}))
print(x)
if __name__ == '__main__':
clean_gitlab_artefact()
And, please find below the output of the script which returns soon (with an accepted response 202 if the artefacts would be deleted asynchronously), as cleaning will happen in the background asynchronously, as also explained in other answers here.
Upvotes: 0
Reputation: 664
This Python solution worked for me with GitLab 13.11.3.
#!/bin/python3
# delete_artifacts.py
import json
import requests
# adapt accordingly
base_url='https://gitlab.example.com'
project_id='1234'
access_token='123412341234'
#
# Get Version Tested with Version 13.11.3
# cf. https://docs.gitlab.com/ee/api/version.html#version-api
#
print(f'GET /version')
x= (requests.get(f"{base_url}/api/v4/version", headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Tested with 13.11.3')
#
# List project jobs
# cf. https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs
#
request_str=f'projects/{project_id}/jobs'
url=f'{base_url}/api/v4/{request_str}'
print(f'GET /{request_str}')
x= (requests.get(url, headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
input('WARNING: This will delete all artifacts. Job logs will remain be available. Press Enter to continue...' )
#
# Delete job artifacts
# cf. https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
#
for entry in data:
request_str=f'projects/{project_id}/jobs/{entry["id"]}/artifacts'
url=f'{base_url}/api/v4/{request_str}'
print(f'DELETE /{request_str}')
x = requests.delete(url, headers = {"PRIVATE-TOKEN": access_token })
print(x)
I'll keep an updated version here. Feel free to reach out and improve the code.
Upvotes: 2
Reputation: 427
If you don't mind removing entire jobs along with their artifacts in bulk, you can use the glab
CLI like this:
glab ci delete --dry-run --older-than 8760h --paginate
This removes all jobs older than 1 year. Just remove --dry-run
to make it happen.
The artifacts seem to be deleted asynchronously, so it may take some time for your repository's storage usage to be updated.
Upvotes: 4
Reputation: 2544
As you said, it's possible to change the retention time by adding a artifacts:expire_in field in the job settings.
In my testing in Gitlab 15.10.2-ee, the setting is applied retroactively to all matching jobs in the history. The deletion is not instant: it presumably happens once some scheduled job runs, probably once a day.
You could also change the instance-wide setting but that doesn't apply to past artifacts.
Upvotes: 0
Reputation: 1246
Building on top of @David 's answer, @Philipp pointed out that there is now an api endpoint to delete only the job artifacts instead of the entire job.
You can run this script directly in the browser's Dev Tools console, or use node-fetch to run in node.js.
//Go to: https://gitlab.com/profile/personal_access_tokens
const API_KEY = "API_KEY";
//You can find project id inside the "General project settings" tab
const PROJECT_ID = 12345678;
const PROJECT_URL = "https://gitlab.com/api/v4/projects/" + PROJECT_ID + "/"
let jobs = [];
for(let i = 0, currentJobs = []; i == 0 || currentJobs.length > 0; i++){
currentJobs = await sendApiRequest(
PROJECT_URL + "jobs/?per_page=100&page=" + (i + 1)
).then(e => e.json());
jobs = jobs.concat(currentJobs);
}
//skip jobs without artifacts
jobs = jobs.filter(e => e.artifacts);
//keep the latest build.
jobs.shift();
for(let job of jobs)
await sendApiRequest(
PROJECT_URL + "jobs/" + job.id + "/artifacts",
{method: "DELETE"}
);
async function sendApiRequest(url, options = {}){
if(!options.headers)
options.headers = {};
options.headers["PRIVATE-TOKEN"] = API_KEY;
return fetch(url, options);
}
Upvotes: 20
Reputation: 2121
You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:
#!/bin/bash
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
for job_id in ${job_ids[@]}
do
URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
echo "$URL"
curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
echo "\n"
done
Upvotes: 23
Reputation: 57
I am on GitLab 8.17 and am able to remove artifacts for particular job by navigating to storage directory on server itself, default path is:
/var/opt/gitlab/gitlab-rails/shared/artifacts/<year_month>/<project_id?>/<jobid>
Removing both whole folder for job or simply contents, disappears artifact view from GitLab pipline page.
The storage path can be changed as described in docs:
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/administration/job_artifacts.md#storing-job-artifacts
Upvotes: 4
Reputation: 15145
According to the documentation, deleting the entire job log (click on the trash can) will also delete the artifacts.
Upvotes: 12