Y. huang
Y. huang

Reputation: 53

Data versioning of "Hello_World" tutorial

i have added "versioned: true" in the "catalog.yml" file of the "hello_world" tutorial.

example_iris_data:
  type: pandas.CSVDataSet
  filepath: data/01_raw/iris.csv
  versioned: true

Then when I used "kedro run" to run the tutorial, it has error as below: "VersionNotFoundError: Did not find any versions for CSVDataSet".

May i know what is the right way for me to do versioning for the "iris.csv" file? thanks!

Upvotes: 0

Views: 701

Answers (2)

Dmitry Deryabin
Dmitry Deryabin

Reputation: 1578

The reason for the error is that when Kedro tries to load the dataset, it looks for a file in data/01_raw/iris.csv/<load_version>/iris.csv and, of course, cannot find such path. So if you really want to enable versioning for your input data, you can move iris.csv like:

mv data/01_raw/iris.csv data/01_raw/iris.csv_tmp
mkdir data/01_raw/iris.csv
mv data/01_raw/iris.csv_tmp data/01_raw/iris.csv/<put_some_timestamp_here>/iris.csv

You wouldn't need to do that for any intermediate data as this path manipulations are done by Kedro automatically when it saves a dataset (but not on load).

Upvotes: 0

921kiyo
921kiyo

Reputation: 572

Try versioning one of the downstream outputs. For example, add this entry in your catalog.yml, and run kedro run

example_train_x:
  type: pandas.CSVDataSet
  filepath: data/02_intermediate/example_iris_data.csv
  versioned: true

And you will see example_iris.data.csv directory (not a file) under data/02_intermediate. The reason example_iris_data gives you an error is that it's the starting data and there's already iris.csv in data/01_raw so, Kedro cannot create data/01_raw/iris.csv/ directory because of the name conflict with the existing iris.csv file.

Hope this helps :)

Upvotes: 1

Related Questions