Reputation: 589
I am adding persistent volume claim to my kubeflow pipeline components and I would like to be able to access the volume from the different components so I can store data in it and retreive it from other components.
vop = kfp.dsl.VolumeOp(
name="create-pvc",
resource_name="my-pvc",
modes=kfp.dsl.VOLUME_MODE_RWO,
size=volume_size
)
comp1.add_pvolumes({"/mnt'": vop.volume})
comp2.add_pvolumes({"/mnt'": comp1.pvolume})
# data read from external source
data=some_data_frame_readed_from_gcs
# data pickled to volume
path = os.path.abspath("data")
with open(path, 'wb') as f:
pickle.dump(data, f)
print("PATH: {}".format(path))
# output path is: "/mnt/data"
# Now I try to read it from comp2
with open(path, 'rb') as f:
df = pickle.load(f)
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data'
Upvotes: 1
Views: 3007
Reputation: 6787
Is there a particular reason you want to use volumes? KFP has built-in data passing mechanisms (See the Data passing for Python)
I'd advice to use the built-in data-passing methods if your data is <10GB in size.
Volumes are less supported and make components not portable. Volumes are not supported on Vertex AI Pipelines for example.
Upvotes: 2