Reputation: 81
I am new to KubeFlow and trying to port / adapt an existing solution to run in KubeFlow pipelines. The issue I am solving now is that the existing solution shared data via a mounted volume. I know this is not the best practice for components exchanging data in KubeFlow however this will be a temporary proof of concept and I have no other choice.
I am facing issues with accessing an existing Volume from the pipeline. I am basically running the code from KubeFlow documentation here, but pointing to an existing K8S Vo
def volume_op_dag():
vop = dsl.VolumeOp(
name="shared-cache",
resource_name="shared-cache",
size="5Gi",
modes=dsl.VOLUME_MODE_RWO
)
The Volume shared-cache exists:
However when I run the pipeline a new volume is created:
What am I doing wrong? I obviously don't want to create a new volume every time I run the pipeline but instead mount an existing one.
Edit: Adding KubeFlow versions:
Upvotes: 3
Views: 1848
Reputation: 11
You can use already existing volume using the following step:
volume_name = 'already_existing_volume name'
#Instead of using this (which every time creates a new volume),
task = create_step_prepare_data().add_pvolumes({data_path: vop.volume})
# use this (just by adding dsl.PipelineVolume(pvc=volume_name))
task = create_step_prepare_data().add_pvolumes({data_path: dsl.PipelineVolume(pvc=volume_name)})
Upvotes: 1
Reputation: 85
Have a look at the function kfp.onperm.mount_pvc
. You can find values for the arguments pvc_name
and volume_name
via the console command
kubectl -n <your-namespace> get pvc
.
The way you use it is by writing the component as if the volume is already mounted and following the example from the doc when binding it in the pipeline:
train = train_op(...)
train.apply(mount_pvc('claim-name', 'pipeline', '/mnt/pipeline'))
Also note, that both the volume and the pipeline must be in the same namespace.
Upvotes: 1