eric
eric

Reputation: 33

Taipy Core configuration file

I am testing the Taipy library, and more precisely Taipy Core.

To test the library, I want to take 2 CSV files, and merge them in a pandas dataframe. I want the resulting dataframe to be the data used for a dashboard done with Taipy GUI.

To provide extra context:

I have created a configuration TOML file, getting inspiration from the documentation, here is how the file looks in VS Code's extension:

enter image description here![enter image description here](personal "taipy config toml file")

And here is the content of the TOML file:

[TAIPY]

[JOB]

[DATA_NODE.fossil_energy]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/per-capita-fossil-energy-vs-gdp.csv"
has_header = "True:bool"
exposed_type = "pandas"

[DATA_NODE.country_codes]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/country_codes.csv"
has_header = "True:bool"
exposed_type = "pandas"

[DATA_NODE.final_dataset]
storage_type = "pickle"
scope = "CYCLE:SCOPE"
exposed_type = "pandas"


[TASK.add_continent]
inputs = [ "fossil_energy:SECTION", "country_codes:SECTION"]
function = "config.functions.preprocess:function"
outputs = [ "final_dataset:SECTION",]
skippable = "False:bool"


[PIPELINE.create_dataset]
tasks = [ "add_continent:SECTION", ]


[SCENARIO.scenario_configuration]
pipelines = [ "create_dataset:SECTION", ]
frequency = "MONTHLY:FREQUENCY"

This is contained in a file called taipy-config.toml.

Then, I have a python file called config.py, with only the following code:

from taipy import Config
import pandas as pd

Config.load("config/taipy-config.toml")

And in my app file, I have the following code:

# from data.data import dataset_fossil_fuels_gdp

import config.config as config

pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")

dataset_fossil_fuels_gdp = pipeline.final_dataset

....

I left the first commented line because "dataset_fossil_fuels_gdp" is alreay a python pandas dataset.

I get the following error :

    pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")
AttributeError: module 'config.config' has no attribute 'create_dataset'

In the same error, I still get this message:

[2023-04-03 00:00:33,367][Taipy][INFO] Loading configuration. Filename: 'config/taipy-config.toml'

[2023-04-03 00:00:33,369][Taipy][INFO] Configuration 'config/taipy-config.toml' successfully loaded.

Thank you for your help!


Edit: I read the 2 first answers, and here is some extra content. Here is the preprocess function, it merge 2 pandas dataframes (that it takes as argument), and then does some transformation. As I said earlier, I have never used Taipy before, I am aware that the interesting part of it is being able to test different scenarios, but I thought I would do a simple data transformation first:

def preprocess(dataset_fossil_fuels_gdp, country_codes):
    print("merging datasets")

    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
        country_codes[["alpha-3", "region"]],
        how="left",
        left_on="Code",
        right_on="alpha-3",
    )
    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
        ~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
    ].reset_index()

    dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
        dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
    )

    return dataset_fossil_fuels_gdp

I have changed my taipy_config.py file as suggested in the first answer:

Config.load("config/taipy-config.toml")

# Get the pipeline and scenario configuration
pipeline_cfg = Config.pipelines['create_dataset']
scenario_cfg = Config.scenarios['scenario_configuration']

# Run the Core service
tp.Core().run()

# Creation of a pipeline and scenario based on the configuration
pipeline = tp.create_pipeline(pipeline_cfg)
scenario = tp.create_scenario(scenario_cfg)

And in my main file:

import config.config as config

pipeline = config.pipeline_cfg

dataset_fossil_fuels_gdp = pipeline.final_dataset

print(dataset_fossil_fuels_gdp.head())

.........

I get the following error:

print(dataset_fossil_fuels_gdp.head())
AttributeError: 'NoneType' object has no attribute 'head'

I do not really understand the behavior behind it, I thought data nodes were data structures (of different possible kinds, such as pandas dataframes) and tasks applied functions to the input structures and had other data structures as an output. This is what I am trying to do so far, before moving forward.

Upvotes: 0

Views: 358

Answers (2)

Florian Jacta
Florian Jacta

Reputation: 1521

Pipelines and scenarios cannot be directly accessed.

When you load the configuration file (config/taipy-config.toml), your configuration objects are organized within the Config object.

  • Config.pipelines: returns a dictionary containing all pipeline configurations.
  • Config.scenarios: returns a dictionary containing all scenario configurations.

In your main.py, this code should work by itself:

from taipy import Config
impor taipy as tp

# Loading of the TOML
Config.load('config/taipy-config.toml')

# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']

# Run the Core service
tp.Core().run()

# Creation of a scenario based on the configuration
scenario = tp.create_scenario(scenario_cfg)

# Submission of the scenario (executes all the tasks)
tp.submit(scenario)

# Read the output Data node of your task that you called final_dataset
print(scenario.final_dataset.read())

Remember that Taipy Core was designed to build the backend for your web application. It will be increasingly useful as your application grows in complexity, requiring more advanced data processing, seamless integration with various services, and efficient handling of multiple scenarios. Scenarios represent different versions of your pipelines that store previous runs, results, and data, allowing for easy comparison.

Edit:

I will go ahead and describe the best structure of your project based on what you have done so far.

Folder config:

  • put your file config/taipy-config.toml in it
  • you should have a Python file (config/functions.py). It contains the Python functions that you want to use in your Task.
def preprocess(dataset_fossil_fuels_gdp, country_codes):
    print("merging datasets", dataset_fossil_fuels_gdp, country_codes)

    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
        country_codes[["alpha-3", "region"]],
        how="left",
        left_on="Code",
        right_on="alpha-3",
    )
    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
        ~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
    ].reset_index()

    dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
        dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
    )

    return dataset_fossil_fuels_gdp

  • have a config/config.py file to manage your configuration. It gets the scenario configuration you have built graphically.
from taipy import Config

# Loading of the TOML
Config.load('config/taipy-config.toml')

# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']

Folder data:

You should have your two CSV files in this folder.

In main.py:

Here is an example of code to use your configuration.

  • Get the scenario configuration from the config/config.py file.
  • Creation of a scenario based on the scenario configuration.
  • Submission of the scenario (executes all of the tasks of the scenario).
  • Read the output Data node (final_dataset here).
from config.config import scenario_cfg
import taipy as tp

if __name__ == "__main__":
    # Run the Core service
    tp.Core().run()

    # Creation of a scenario based on the configuration
    scenario = tp.create_scenario(scenario_cfg)

    tp.submit(scenario)
    print(scenario.final_dataset.read())

Project structure

Upvotes: 2

Jean-Robin Medori
Jean-Robin Medori

Reputation: 81

I assume in your app file with the code config.create_dataset, you are trying to use the 'create_dataset' PipelineConfig to instantiate a new pipeline from it. However, here config corresponds to your own module 'config.py', and you don't expose any attribute or variable named 'create_dataset'.

If you want to retrieve it, you must use the taipy Config object (with an upper case) as follows :

Config.pipelines["create_dataset"]

Here is the code that should work.

import taipy as tp
import config as config  # Loads 'config.toml' into Taipy Config
from taipy import Config  # Imports Taipy Config


pipeline = tp.create_pipeline(Config.pipelines["create_dataset"]) 
pipeline.name="Pipeline to load the dataset"

dataset_fossil_fuels_gdp = pipeline.final_dataset

Here is what I did:

  1. I imported taipy: import taipy as tp
  2. I imported Taipy Config: from taipy import Config
  3. I retrieved the PipelineConfig named "create_dataset" from the Taipy Config using the pipeline dictionary: Config.pipelines["create_dataset"]
  4. the function create_pipeline does not accept any **kwargs arguments, so I removed the name you provided and add it to a separate instruction: pipeline.name="Pipeline to load the dataset"

I hope that can help.

By the way, I noticed a few things not related to your issue:

  1. On config.py, you don't need to import pandas import pandas as pd
  2. You are creating a scenario configuration but only instantiating a pipeline. You could instantiate a scenario instead. That is not mandatory and it depends on your use case but usually, it is more convenient to benefit from all functionalities associated with scenarios:

scenario = tp.create_scenario(Config.scenarios["scenario_configuration"]) scenario.name="Scenario to load the dataset"

  1. The input data nodes configs fossil_energy and country_codes are GLOBAL scoped. That means only one Datanode instance can be created. The output data node config is CYCLE scoped. That means you will have one data node instance per cycle. This looks suspicious to me. If the inputs don't change over cycles, I expect the output to not change either. The only case I can imagine is if your function preprocess is not deterministic and depends on external resources.

Upvotes: 2

Related Questions