6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"Large files for GitHub CICD\",\"text\":\"

I have a GitHub repo of a pipeline that requires very large files as input (basic test datasets would be around 1-2 Gb).

\\n

I thought about circunventing this by doing CICD locally, but this will not allow the CICD to run if other people want to contribute to the repo right?

\\n

Is there any workflow that allows for complex CICD with large datasets, while also enabling pull requests CICD?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"João Sequeira\"},\"upvoteCount\":5,\"answerCount\":1,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","github",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/github/1","children":"github"}]}],["$","span","continuous-integration",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/continuous-integration/1","children":"continuous-integration"}]}],["$","span","continuous-deployment",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/continuous-deployment/1","children":"continuous-deployment"}]}],["$","span","fastq",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/fastq/1","children":"fastq"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://i.sstatic.net/q4khq.jpg?s=256","alt":"João Sequeira","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/5260316/jo%c3%a3o-sequeira","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"João Sequeira"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",67]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"Large files for GitHub CICD"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

I have a GitHub repo of a pipeline that requires very large files as input (basic test datasets would be around 1-2 Gb).

I thought about circunventing this by doing CICD locally, but this will not allow the CICD to run if other people want to contribute to the repo right?

Is there any workflow that allows for complex CICD with large datasets, while also enabling pull requests CICD?

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",5]}],["$","p",null,{"children":["Views: ",1016]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",1,")"]}],[["$","div","79479191",{"className":"bg-white shadow-md rounded-lg p-6 mb-6","children":[["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://lh3.googleusercontent.com/-_DPqATxa248/AAAAAAAAAAI/AAAAAAAABhk/xM54FvgfCe8/photo.jpg?sz=256","alt":"Vimal Patel","className":"w-12 h-12 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/5259694/vimal-patel","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Vimal Patel"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",119]}]]}]]}],["$","p",null,{"className":"text-gray-700 mb-4","dangerouslySetInnerHTML":{"__html":"

Use External Storage for Large Files in above scenario

Store Large Files Externally possibly into AWS S3/Google Drive / GCS\nAzure Blob Storage / GitHub Releases (for versioned datasets)

Modify CI/CD Workflow to Download Data from these external sources

In GitHub Actions, use wget or aws s3 cp to fetch the dataset before running tests.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm","children":["$","p",null,{"children":["Upvotes: ",0]}]}]]}]]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","65820300",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/65820300","className":"text-blue-600 hover:underline","children":"Pushing files over 100MB to GitHub"}]}],["$","li","17888604",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/17888604","className":"text-blue-600 hover:underline","children":"Git with large files"}]}],["$","li","61109976",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/61109976","className":"text-blue-600 hover:underline","children":"Excluding “detected large files" and following the routine no git/github"}]}],["$","li","41117906",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/41117906","className":"text-blue-600 hover:underline","children":"How to push large size file to github?"}]}],["$","li","59169563",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/59169563","className":"text-blue-600 hover:underline","children":"How to deploy github project of size 876 MB?"}]}],["$","li","54916551",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/54916551","className":"text-blue-600 hover:underline","children":"Files too big for GitHub"}]}],["$","li","37986291",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/37986291","className":"text-blue-600 hover:underline","children":"How to import git repositories with large files?"}]}],["$","li","52156210",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/52156210","className":"text-blue-600 hover:underline","children":"How to decrease maximum file size on Git or Github repo?"}]}],["$","li","13115058",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/13115058","className":"text-blue-600 hover:underline","children":"How to manage large data files with GitHub?"}]}],["$","li","37838017",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/37838017","className":"text-blue-600 hover:underline","children":"github's recommended maximum file size"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

Large files for GitHub CICD

Answers (1)

Related Questions