Taipy Core configuration file

Question

I am testing the Taipy library, and more precisely Taipy Core.

To test the library, I want to take 2 CSV files, and merge them in a pandas dataframe. I want the resulting dataframe to be the data used for a dashboard done with Taipy GUI.

To provide extra context:

The Dashboard app works fine when I input the CSV data directly using pandas and transforming the data with pandas, so there is no problem with the CSV files or the Taipy GUI app
I could use pandas, since it already works and it is more simple, but the only goal of this app is to learn how to use Tapy Core to then do more complex things...

I have created a configuration TOML file, getting inspiration from the documentation, here is how the file looks in VS Code's extension:

![enter image description here](personal "taipy config toml file")

And here is the content of the TOML file:

[TAIPY]

[JOB]

[DATA_NODE.fossil_energy]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/per-capita-fossil-energy-vs-gdp.csv"
has_header = "True:bool"
exposed_type = "pandas"

[DATA_NODE.country_codes]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/country_codes.csv"
has_header = "True:bool"
exposed_type = "pandas"

[DATA_NODE.final_dataset]
storage_type = "pickle"
scope = "CYCLE:SCOPE"
exposed_type = "pandas"


[TASK.add_continent]
inputs = [ "fossil_energy:SECTION", "country_codes:SECTION"]
function = "config.functions.preprocess:function"
outputs = [ "final_dataset:SECTION",]
skippable = "False:bool"


[PIPELINE.create_dataset]
tasks = [ "add_continent:SECTION", ]


[SCENARIO.scenario_configuration]
pipelines = [ "create_dataset:SECTION", ]
frequency = "MONTHLY:FREQUENCY"

This is contained in a file called taipy-config.toml.

Then, I have a python file called config.py, with only the following code:

from taipy import Config
import pandas as pd

Config.load("config/taipy-config.toml")

And in my app file, I have the following code:

# from data.data import dataset_fossil_fuels_gdp

import config.config as config

pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")

dataset_fossil_fuels_gdp = pipeline.final_dataset

....

I left the first commented line because "dataset_fossil_fuels_gdp" is alreay a python pandas dataset.

I get the following error :

    pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")
AttributeError: module 'config.config' has no attribute 'create_dataset'

In the same error, I still get this message:

[2023-04-03 00:00:33,367][Taipy][INFO] Loading configuration. Filename: 'config/taipy-config.toml'

[2023-04-03 00:00:33,369][Taipy][INFO] Configuration 'config/taipy-config.toml' successfully loaded.

Thank you for your help!

Edit: I read the 2 first answers, and here is some extra content. Here is the preprocess function, it merge 2 pandas dataframes (that it takes as argument), and then does some transformation. As I said earlier, I have never used Taipy before, I am aware that the interesting part of it is being able to test different scenarios, but I thought I would do a simple data transformation first:

def preprocess(dataset_fossil_fuels_gdp, country_codes):
    print("merging datasets")

    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
        country_codes[["alpha-3", "region"]],
        how="left",
        left_on="Code",
        right_on="alpha-3",
    )
    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
        ~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
    ].reset_index()

    dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
        dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
    )

    return dataset_fossil_fuels_gdp

I have changed my taipy_config.py file as suggested in the first answer:

Config.load("config/taipy-config.toml")

# Get the pipeline and scenario configuration
pipeline_cfg = Config.pipelines['create_dataset']
scenario_cfg = Config.scenarios['scenario_configuration']

# Run the Core service
tp.Core().run()

# Creation of a pipeline and scenario based on the configuration
pipeline = tp.create_pipeline(pipeline_cfg)
scenario = tp.create_scenario(scenario_cfg)

And in my main file:

import config.config as config

pipeline = config.pipeline_cfg

dataset_fossil_fuels_gdp = pipeline.final_dataset

print(dataset_fossil_fuels_gdp.head())

.........

I get the following error:

print(dataset_fossil_fuels_gdp.head())
AttributeError: 'NoneType' object has no attribute 'head'

I do not really understand the behavior behind it, I thought data nodes were data structures (of different possible kinds, such as pandas dataframes) and tasks applied functions to the input structures and had other data structures as an output. This is what I am trying to do so far, before moving forward.

Florian Jacta · Accepted Answer

Pipelines and scenarios cannot be directly accessed.

When you load the configuration file (config/taipy-config.toml), your configuration objects are organized within the Config object.

Config.pipelines: returns a dictionary containing all pipeline configurations.
Config.scenarios: returns a dictionary containing all scenario configurations.

In your main.py, this code should work by itself:

from taipy import Config
impor taipy as tp

# Loading of the TOML
Config.load('config/taipy-config.toml')

# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']

# Run the Core service
tp.Core().run()

# Creation of a scenario based on the configuration
scenario = tp.create_scenario(scenario_cfg)

# Submission of the scenario (executes all the tasks)
tp.submit(scenario)

# Read the output Data node of your task that you called final_dataset
print(scenario.final_dataset.read())

Remember that Taipy Core was designed to build the backend for your web application. It will be increasingly useful as your application grows in complexity, requiring more advanced data processing, seamless integration with various services, and efficient handling of multiple scenarios. Scenarios represent different versions of your pipelines that store previous runs, results, and data, allowing for easy comparison.

Edit:

I will go ahead and describe the best structure of your project based on what you have done so far.

Folder config:

put your file config/taipy-config.toml in it
you should have a Python file (config/functions.py). It contains the Python functions that you want to use in your Task.

def preprocess(dataset_fossil_fuels_gdp, country_codes):
    print("merging datasets", dataset_fossil_fuels_gdp, country_codes)

    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
        country_codes[["alpha-3", "region"]],
        how="left",
        left_on="Code",
        right_on="alpha-3",
    )
    dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
        ~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
    ].reset_index()

    dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
        dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
    )

    return dataset_fossil_fuels_gdp

have a config/config.py file to manage your configuration. It gets the scenario configuration you have built graphically.

from taipy import Config

# Loading of the TOML
Config.load('config/taipy-config.toml')

# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']

Folder data:

You should have your two CSV files in this folder.

In main.py:

Here is an example of code to use your configuration.

Get the scenario configuration from the config/config.py file.
Creation of a scenario based on the scenario configuration.
Submission of the scenario (executes all of the tasks of the scenario).
Read the output Data node (final_dataset here).

from config.config import scenario_cfg
import taipy as tp

if __name__ == "__main__":
    # Run the Core service
    tp.Core().run()

    # Creation of a scenario based on the configuration
    scenario = tp.create_scenario(scenario_cfg)

    tp.submit(scenario)
    print(scenario.final_dataset.read())

Taipy Core configuration file

Answers (2)

Edit:

Folder config:

Folder data:

In main.py:

Related Questions