Reputation: 53
i have added "versioned: true" in the "catalog.yml" file of the "hello_world" tutorial.
example_iris_data:
type: pandas.CSVDataSet
filepath: data/01_raw/iris.csv
versioned: true
Then when I used "kedro run" to run the tutorial, it has error as below: "VersionNotFoundError: Did not find any versions for CSVDataSet".
May i know what is the right way for me to do versioning for the "iris.csv" file? thanks!
Upvotes: 0
Views: 701
Reputation: 1578
The reason for the error is that when Kedro tries to load the dataset, it looks for a file in data/01_raw/iris.csv/<load_version>/iris.csv
and, of course, cannot find such path. So if you really want to enable versioning for your input data, you can move iris.csv
like:
mv data/01_raw/iris.csv data/01_raw/iris.csv_tmp
mkdir data/01_raw/iris.csv
mv data/01_raw/iris.csv_tmp data/01_raw/iris.csv/<put_some_timestamp_here>/iris.csv
You wouldn't need to do that for any intermediate data as this path manipulations are done by Kedro automatically when it saves a dataset (but not on load).
Upvotes: 0
Reputation: 572
Try versioning one of the downstream outputs. For example, add this entry in your catalog.yml
, and run kedro run
example_train_x:
type: pandas.CSVDataSet
filepath: data/02_intermediate/example_iris_data.csv
versioned: true
And you will see example_iris.data.csv
directory (not a file) under data/02_intermediate
. The reason example_iris_data
gives you an error is that it's the starting data and there's already iris.csv
in data/01_raw
so, Kedro cannot create data/01_raw/iris.csv/
directory because of the name conflict with the existing iris.csv
file.
Hope this helps :)
Upvotes: 1