Reputation: 165
I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog:
Command performed: kedro catalog list
Error:
kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet
df_medinfo_raw
: ObjectParquetDataSet
cannot be loaded fromkedro.extras.datasets.pandas
. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.ParquetDataSet:
I installed kedro trough conda-forge: conda install -c conda-forge "kedro[pandas]"
. As far as I understand, this way to install kedro also installs the pandas dependencies.
I tried to read the kedro documentation for dependencies, but it's not really clear how to solve this kind of issue.
My kedro version is 0.17.6.
Upvotes: 3
Views: 4881
Reputation: 77020
Kedro uses Pandas to load ParquetDataSet
objects, and Pandas requires additional dependencies to accomplish this (see "Installation: Other data sources"). That is, in addition to Pandas, one must also install either fastparquet
or pyarrow
.
For Conda you either want:
## use pyarrow for parquet
conda install -c conda-forge kedro pandas pyarrow
or
## or use fastparquet for parquet
conda install -c conda-forge kedro pandas fastparquet
Note that the syntax used in the question kedro[pandas]
is meaningless to Conda (i.e., it ultimately parses to just kedro
). Conda package specification uses a custom grammar called MatchSpec
, where anything inside a [...]
is parsed for a [key1=value1;key2=value2;...]
syntax. Essentially, the [pandas]
is treated as an unknown key, which is ignored.
Upvotes: 3
Reputation: 2345
Try installing using pip
pip install "kedro[pandas]"
As of now, conda doesn't support optional dependencies. Feature request for the same is submitted here https://github.com/conda/conda/issues/7502
Also, in kedro docs its mentioned pip is recommended https://kedro.readthedocs.io/en/stable/02_get_started/02_install.html
It is also possible to install Kedro using conda, as follows, but we recommend using pip at this point to eliminate any potential dependency issues, as follows:
Also, as @datajoely mentioned, you can also be more specific about which all dataset modules you need with the following.
pip install "kedro[pandas.ParquetDataSet]"
You can read more about kedro dependencies here https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html?highlight=top-level#workflow-dependencies
Upvotes: 2