user1251007
user1251007

Reputation: 16731

Remove artifacts from CI manually

I have a private repository at gitlab.com that uses the CI feature. Some of the CI jobs create artifacts files that are stored. I just implemented that the artifacts are deleted automatically after one day by adding this to the CI configuration:

expire_in: 1 day

That works great - however, old artifacts won't be deleted (as expected). So my question is:

How can I delete old artifacts or artifacts that do not expire? (on gitlab.com, no direct access to the server)

Upvotes: 40

Views: 66208

Answers (12)

mauriau
mauriau

Reputation: 171

I did this script to remove artifacts with graphql api

import { GraphQLClient } from 'graphql-request'

function graphQLClient(jwt) {
    return new GraphQLClient("https://gitlab.com/api/graphql", {
        headers: {
            authorization: `Bearer ${jwt}`,
        },
    })
}

const projectPath = "GROUPE/PROJECT"

//Go to: https://gitlab.com/profile/personal_access_tokens
// GITLAB PERSONAL API TOKEN
const GRAPHQL_TOKEN="REPLACE_ME"
const PROJECT_ID = REPLACE_ME; // a number

async function removeArtifacts(pageSize = 50, cursor = '') {


    const jobArtifactsQuery = `
            query getJobArtifacts(
              $projectPath: ID!
              $firstPageSize: Int
              $lastPageSize: Int
              $prevPageCursor: String = ""
              $nextPageCursor: String = ""
            ) {
              project(fullPath: $projectPath) {
                id
                  __typename
                jobs(
                  withArtifacts: true
                  first: $firstPageSize
                  last: $lastPageSize
                  after: $nextPageCursor
                  before: $prevPageCursor
                ) {
                  nodes {
                    artifacts {
                      nodes {
                        id
                        expireAt
                        __typename
                      }
                      __typename
                    }
                    __typename
                  }
                  pageInfo {
                    ...PageInfo
                    __typename
                  }
                  __typename
                }
                __typename
              }
            }
            fragment PageInfo on PageInfo {
              hasNextPage
              hasPreviousPage
              startCursor
              endCursor
              __typename
            }
    `



    const variables = {
        projectPath: projectPath,
        firstPageSize: pageSize,
        lastPageSize: null,
        nextPageCursor: cursor,
        prePageCursor: ""
    }
    let artifactsIds = []
    console.log("Fetching artifacts...");
    const pageInfo =  graphQLClient(GRAPHQL_TOKEN).request(jobArtifactsQuery, variables).then(list => {
        const artifacts = list.project.jobs.nodes.map(e => e.artifacts.nodes.map(f => f.id))
        artifactsIds = artifacts.flat().map(item => `"${item}"`);
        console.log(artifactsIds)
        if(artifactsIds.length === 0) {
            console.log("No artifacts found");
            return list.project.jobs.pageInfo
        }

        const bulkDestroyJobArtifacts = `
        mutation {
              bulkDestroyJobArtifacts(input:{
                projectId: "gid://gitlab/Project/REPLACE_ME",
                ids: [${artifactsIds}]
              }) {
                 destroyedCount
                 destroyedIds
                 errors
              }
        }
        `
        console.log("Deleting artifacts...");
        graphQLClient(GRAPHQL_TOKEN).request(bulkDestroyJobArtifacts).then(r => console.log(r.bulkDestroyJobArtifacts   ))

        const queryArtifactsSize = `
    query getBuildArtifactsSize($projectPath: ID!) {  project(fullPath: $projectPath) {    id    statistics {      buildArtifactsSize      __typename    }    __typename  }}
    `

        graphQLClient(GRAPHQL_TOKEN).request(queryArtifactsSize, {projectPath: projectPath}).then(r => console.log(r.project.statistics.buildArtifactsSize))
        return list.project.jobs.pageInfo
    })

    console.log(artifactsIds);

    return pageInfo;
}
function main (cursor = "") {

    const pageInfo = removeArtifacts(99, cursor);
    pageInfo.then(p => {
        if(!p.hasNextPage) {
            console.log("Done!");

            return;
        }
        main(p.endCursor);
    })
}
const timeTaken = "Time taken by removeArtifacts function";
console.time(timeTaken);

main()

console.timeEnd(timeTaken);

Upvotes: 1

VonC
VonC

Reputation: 1325427

An API call should be easier to script, with GitLab 14.7 (January 2022), which now offers:

Bulk delete artifacts with the API

While a good strategy for managing storage consumption is to set regular expiration policies for artifacts, sometimes you need to reduce items in storage right away.

Previously, you might have used a script to automate the tedious task of deleting artifacts one by one with API calls, but now you can use a new API endpoint to bulk delete job artifacts quickly and easily.

See Documentation, Issue 223793 and Merge Request 75488.

 curl --request DELETE --header "PRIVATE-TOKEN: <your_access_token>" \
      "https://gitlab.example.com/api/v4/projects/1/artifacts"

As noted by Lubo in the comments:

Response of given API is 202 Accepted. It means for me, that deletion will happen on background.

Also admin area ís updated a bit later than deletion happens


As noted by Lorenz Leitner in the comments, the bulk delete API endpoint (introduced in GitLab 14.7) will not necessarily remove artifacts that are protected by the "Keep the latest artifacts for all jobs in the latest successful pipelines" setting.

If you want the bulk delete API to work on these locked artifacts, you need to disable this setting at the project or instance level (depending on configuration).
Disabling it unlocks them for deletion, but it will not immediately remove them (application settings have a cache expiry). A new pipeline needs to run before the locked artifacts become eligible for deletion.

Upvotes: 30

Daniel Vianna
Daniel Vianna

Reputation: 581

If you have deleted all the jobs by accident (thinking the artifacts would be gone, but they didn't) what would be the alternative then brute-forcing a loop range?

I have this code, which does bruteforce on a range of numbers. But since I use the gitlab.com public runners, It's a long-range

    # project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="xxxxxx" #

# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="yyyyy"
server="gitlab.com"


# Get a range of the oldest known job and the lastet known one, then bruteforce. Used in the case when you deleted pipelines and can't retrive Job Ids.

# https://stackoverflow.com/questions/52609966/for-loop-over-sequence-of-large-numbers-in-bash
for (( job_id = 59216999; job_id <= 190239535; job_id++ )) do
echo "$job_id"

echo Job ID being deleted is "$job_id"

curl --request POST --header "PRIVATE-TOKEN:${token}" "https://${server}/api/v4/projects/${project_id}/jobs/${job_id}/erase"
echo -en '\n'
echo -en '\n'
done

Upvotes: 2

Monish Sen
Monish Sen

Reputation: 1888

None of the API based solutions worked for me because the DELETE API only sets the expiry date on the build. Then it is upto sidekiq to perform the deletion. If there is a bug in gitlab that ignores the expiry date on artifacts even if it is set, then nothing will happen.

Builds can also be removed manually with rails runner. The below script cleans up both artifacts as well as job logs that are older than 1 month. Note that it only does so for the top 20 projects that are consuming diskpace

#!/usr/bin/env ruby

# This is a ruby script to delete build artifacts from gitlab that are older than 1 month
# Copy this file to /tmp/ folder on gitlab server then execute rails runner as below
# gitlab-rails runner /tmp/cleanupArtifacts.rb

include ActionView::Helpers::NumberHelper


ProjectStatistics.order(build_artifacts_size: :desc).limit(20).each do |s|
 builds_artifacts =  s.project.builds.with_downloadable_artifacts
  counter=1
  builds_artifacts.find_each do |build|
    counter=counter+1
    puts "Build #{build.id} \t created at #{build.created_at}"

    if build.created_at < 1.month.ago
      puts "Build #{build.id} marked for deletion"
      build.destroy!
    end

  end
  puts "#{number_to_human_size(s.build_artifacts_size)} \t #{s.project.full_path} \t Builds: #{counter}"

end

Upvotes: 0

Am_I_Helpful
Am_I_Helpful

Reputation: 19168

Although the answers here are pretty nicely summarised, I am just adding on the Python script used by me for manually cleaning only the artefacts on the latest version of GitLab linux installation (v16.10.3-ee).

I first retrieved the number of pages and projects I have in our GitLab environment, using the values "x-total-pages" and "x-total" of the command curl https://gitlab.company/api/v4/projects?private_token=<token> --head. Then, I iterated through the GitLab projects in the paginated manner, and retrieved the project-IDs into a list. Lastly, I iterated through this list to perform the necessary artefact cleanup.

# This is a sample Python script referenced on the idea from https://stackoverflow.com/a/70817349/3482140

import json
import requests


def clean_gitlab_artefact():
    base_url = "https://gitlab.company"
    access_token = "access-token"  # check with your GitLab project owner
    print(f'GET /version')
    x = (requests.get(f"{base_url}/api/v4/version", headers={"PRIVATE-TOKEN": access_token}))
    print(x)
    data = json.loads(x.text)
    print(f'Using GitLab version {data["version"]}. Implemented on 16.10.3-ee!')

    # # there were 173 projects at the time of running this script, which can be checked by exploring the
    # # value "x-total" of the command `curl https://gitlab.company/api/v4/projects?private_token=<token> --head`
    page = 1
    total_project_ids = []
    while page != 3:
        print(f'GET /project-IDs')
        projects = (requests.get(f"{base_url}/api/v4/projects?per_page=100&page={page}",
                                 headers={"PRIVATE-TOKEN": access_token}))
        page += 1
        # print(projects)
        data = json.loads(projects.text)
        project_ids = [o["id"] for o in data]
        total_project_ids += project_ids
    print(total_project_ids)

    for project_id in total_project_ids:
        request_str = f'projects/{project_id}/artifacts'
        url = f'{base_url}/api/v4/{request_str}'
        print(f'DELETE /{request_str}')
        x = (requests.delete(url, headers={"PRIVATE-TOKEN": access_token}))
        print(x)

if __name__ == '__main__':
    clean_gitlab_artefact()

And, please find below the output of the script which returns soon (with an accepted response 202 if the artefacts would be deleted asynchronously), as cleaning will happen in the background asynchronously, as also explained in other answers here. enter image description here

Upvotes: 0

el_tenedor
el_tenedor

Reputation: 664

This Python solution worked for me with GitLab 13.11.3.

#!/bin/python3
# delete_artifacts.py  

import json
import requests

# adapt accordingly
base_url='https://gitlab.example.com'
project_id='1234'
access_token='123412341234'

#
# Get Version Tested with Version 13.11.3
# cf. https://docs.gitlab.com/ee/api/version.html#version-api
#
print(f'GET /version')
x= (requests.get(f"{base_url}/api/v4/version", headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)
print(f'Using GitLab version {data["version"]}. Tested with 13.11.3')

#
# List project jobs
# cf. https://docs.gitlab.com/ee/api/jobs.html#list-project-jobs
#
request_str=f'projects/{project_id}/jobs'
url=f'{base_url}/api/v4/{request_str}'
print(f'GET /{request_str}')
x= (requests.get(url, headers = {"PRIVATE-TOKEN": access_token }))
print(x)
data=json.loads(x.text)

input('WARNING: This will delete all artifacts. Job logs will remain be available. Press Enter to continue...' )

#
# Delete job artifacts
# cf. https://docs.gitlab.com/ee/api/job_artifacts.html#delete-artifacts
#
for entry in data:
    request_str=f'projects/{project_id}/jobs/{entry["id"]}/artifacts'
    url=f'{base_url}/api/v4/{request_str}'
    print(f'DELETE /{request_str}')
    x = requests.delete(url, headers = {"PRIVATE-TOKEN": access_token })
    print(x)

I'll keep an updated version here. Feel free to reach out and improve the code.

Upvotes: 2

charlesroelli
charlesroelli

Reputation: 427

If you don't mind removing entire jobs along with their artifacts in bulk, you can use the glab CLI like this:

glab ci delete --dry-run --older-than 8760h --paginate

This removes all jobs older than 1 year. Just remove --dry-run to make it happen.

The artifacts seem to be deleted asynchronously, so it may take some time for your repository's storage usage to be updated.

Upvotes: 4

Nemo
Nemo

Reputation: 2544

As you said, it's possible to change the retention time by adding a artifacts:expire_in field in the job settings.

In my testing in Gitlab 15.10.2-ee, the setting is applied retroactively to all matching jobs in the history. The deletion is not instant: it presumably happens once some scheduled job runs, probably once a day.

You could also change the instance-wide setting but that doesn't apply to past artifacts.

Upvotes: 0

Kartik Soneji
Kartik Soneji

Reputation: 1246

Building on top of @David 's answer, @Philipp pointed out that there is now an api endpoint to delete only the job artifacts instead of the entire job.

You can run this script directly in the browser's Dev Tools console, or use node-fetch to run in node.js.

//Go to: https://gitlab.com/profile/personal_access_tokens
const API_KEY = "API_KEY";

//You can find project id inside the "General project settings" tab
const PROJECT_ID = 12345678;
const PROJECT_URL = "https://gitlab.com/api/v4/projects/" + PROJECT_ID + "/"

let jobs = [];
for(let i = 0, currentJobs = []; i == 0 || currentJobs.length > 0; i++){
    currentJobs = await sendApiRequest(
        PROJECT_URL + "jobs/?per_page=100&page=" + (i + 1)
    ).then(e => e.json());
    jobs = jobs.concat(currentJobs);
}

//skip jobs without artifacts
jobs = jobs.filter(e => e.artifacts);

//keep the latest build.
jobs.shift();

for(let job of jobs)
    await sendApiRequest(
        PROJECT_URL + "jobs/" + job.id + "/artifacts",
        {method: "DELETE"}
    );

async function sendApiRequest(url, options = {}){
    if(!options.headers)
        options.headers = {};
    options.headers["PRIVATE-TOKEN"] = API_KEY;

    return fetch(url, options);
}

Upvotes: 20

David Archer
David Archer

Reputation: 2121

You can use the GitLab REST API to delete the artifacts from the jobs if you don't have direct access to the server. Here's a sample curl script that uses the API:

#!/bin/bash
    
# project_id, find it here: https://gitlab.com/[organization name]/[repository name]/edit inside the "General project settings" tab
project_id="3034900"
    
# token, find it here: https://gitlab.com/profile/personal_access_tokens
token="Lifg_azxDyRp8eyNFRfg"
server="gitlab.com"
    
# go to https://gitlab.com/[organization name]/[repository name]/-/jobs
# then open JavaScript console
# copy/paste => copy(_.uniq($('.ci-status').map((x, e) => /([0-9]+)/.exec(e.href)).toArray()).join(' '))
# press enter, and then copy the result here :
# repeat for every page you want
job_ids=(48875658 48874137 48873496 48872419)
    
for job_id in ${job_ids[@]}
do
     URL="https://$server/api/v4/projects/$project_id/jobs/$job_id/erase"
     echo "$URL"
     curl --request POST --header "PRIVATE-TOKEN:${token}" "$URL"
     echo "\n"
done

Upvotes: 23

PapaSmurf
PapaSmurf

Reputation: 57

I am on GitLab 8.17 and am able to remove artifacts for particular job by navigating to storage directory on server itself, default path is:

/var/opt/gitlab/gitlab-rails/shared/artifacts/<year_month>/<project_id?>/<jobid>

Removing both whole folder for job or simply contents, disappears artifact view from GitLab pipline page.

The storage path can be changed as described in docs:
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/administration/job_artifacts.md#storing-job-artifacts

Upvotes: 4

Cephalopod
Cephalopod

Reputation: 15145

According to the documentation, deleting the entire job log (click on the trash can) will also delete the artifacts.

Upvotes: 12

Related Questions