Rubens Rodrigues
Rubens Rodrigues

Reputation: 165

AttributeError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas

I'm quite new using Kedro and after installing kedro in my conda environment, I'm getting the following error when trying to list my catalog:

Command performed: kedro catalog list

Error:

kedro.io.core.DataSetError: An exception occurred when parsing config for DataSet df_medinfo_raw: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas. Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.ParquetDataSet:

I installed kedro trough conda-forge: conda install -c conda-forge "kedro[pandas]". As far as I understand, this way to install kedro also installs the pandas dependencies.

I tried to read the kedro documentation for dependencies, but it's not really clear how to solve this kind of issue.

My kedro version is 0.17.6.

Upvotes: 3

Views: 4881

Answers (2)

merv
merv

Reputation: 77020

Kedro uses Pandas to load ParquetDataSet objects, and Pandas requires additional dependencies to accomplish this (see "Installation: Other data sources"). That is, in addition to Pandas, one must also install either fastparquet or pyarrow.

For Conda you either want:

## use pyarrow for parquet
conda install -c conda-forge kedro pandas pyarrow

or

## or use fastparquet for parquet
conda install -c conda-forge kedro pandas fastparquet

Note that the syntax used in the question kedro[pandas] is meaningless to Conda (i.e., it ultimately parses to just kedro). Conda package specification uses a custom grammar called MatchSpec, where anything inside a [...] is parsed for a [key1=value1;key2=value2;...] syntax. Essentially, the [pandas] is treated as an unknown key, which is ignored.

Upvotes: 3

Rahul Kumar
Rahul Kumar

Reputation: 2345

Try installing using pip

pip install "kedro[pandas]"

As of now, conda doesn't support optional dependencies. Feature request for the same is submitted here https://github.com/conda/conda/issues/7502

Also, in kedro docs its mentioned pip is recommended https://kedro.readthedocs.io/en/stable/02_get_started/02_install.html

It is also possible to install Kedro using conda, as follows, but we recommend using pip at this point to eliminate any potential dependency issues, as follows:

Also, as @datajoely mentioned, you can also be more specific about which all dataset modules you need with the following.

pip install "kedro[pandas.ParquetDataSet]"

You can read more about kedro dependencies here https://kedro.readthedocs.io/en/stable/04_kedro_project_setup/01_dependencies.html?highlight=top-level#workflow-dependencies

Upvotes: 2

Related Questions