Reputation: 67
I have a GitHub repo of a pipeline that requires very large files as input (basic test datasets would be around 1-2 Gb).
I thought about circunventing this by doing CICD locally, but this will not allow the CICD to run if other people want to contribute to the repo right?
Is there any workflow that allows for complex CICD with large datasets, while also enabling pull requests CICD?
Upvotes: 5
Views: 1016
Reputation: 119
Use External Storage for Large Files in above scenario
Store Large Files Externally possibly into AWS S3/Google Drive / GCS Azure Blob Storage / GitHub Releases (for versioned datasets)
Modify CI/CD Workflow to Download Data from these external sources
In GitHub Actions, use wget or aws s3 cp to fetch the dataset before running tests.
Upvotes: 0