Reputation: 33
I am testing the Taipy library, and more precisely Taipy Core.
To test the library, I want to take 2 CSV files, and merge them in a pandas dataframe. I want the resulting dataframe to be the data used for a dashboard done with Taipy GUI.
To provide extra context:
I have created a configuration TOML file, getting inspiration from the documentation, here is how the file looks in VS Code's extension:

And here is the content of the TOML file:
[TAIPY]
[JOB]
[DATA_NODE.fossil_energy]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/per-capita-fossil-energy-vs-gdp.csv"
has_header = "True:bool"
exposed_type = "pandas"
[DATA_NODE.country_codes]
storage_type = "csv"
scope = "GLOBAL:SCOPE"
default_path = "data/country_codes.csv"
has_header = "True:bool"
exposed_type = "pandas"
[DATA_NODE.final_dataset]
storage_type = "pickle"
scope = "CYCLE:SCOPE"
exposed_type = "pandas"
[TASK.add_continent]
inputs = [ "fossil_energy:SECTION", "country_codes:SECTION"]
function = "config.functions.preprocess:function"
outputs = [ "final_dataset:SECTION",]
skippable = "False:bool"
[PIPELINE.create_dataset]
tasks = [ "add_continent:SECTION", ]
[SCENARIO.scenario_configuration]
pipelines = [ "create_dataset:SECTION", ]
frequency = "MONTHLY:FREQUENCY"
This is contained in a file called taipy-config.toml.
Then, I have a python file called config.py, with only the following code:
from taipy import Config
import pandas as pd
Config.load("config/taipy-config.toml")
And in my app file, I have the following code:
# from data.data import dataset_fossil_fuels_gdp
import config.config as config
pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")
dataset_fossil_fuels_gdp = pipeline.final_dataset
....
I left the first commented line because "dataset_fossil_fuels_gdp" is alreay a python pandas dataset.
I get the following error :
pipeline = tp.create_pipeline(config.create_dataset, name="Pipeline to load the dataset")
AttributeError: module 'config.config' has no attribute 'create_dataset'
In the same error, I still get this message:
[2023-04-03 00:00:33,367][Taipy][INFO] Loading configuration. Filename: 'config/taipy-config.toml'
[2023-04-03 00:00:33,369][Taipy][INFO] Configuration 'config/taipy-config.toml' successfully loaded.
Thank you for your help!
Edit: I read the 2 first answers, and here is some extra content. Here is the preprocess function, it merge 2 pandas dataframes (that it takes as argument), and then does some transformation. As I said earlier, I have never used Taipy before, I am aware that the interesting part of it is being able to test different scenarios, but I thought I would do a simple data transformation first:
def preprocess(dataset_fossil_fuels_gdp, country_codes):
print("merging datasets")
dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
country_codes[["alpha-3", "region"]],
how="left",
left_on="Code",
right_on="alpha-3",
)
dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
].reset_index()
dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
)
return dataset_fossil_fuels_gdp
I have changed my taipy_config.py file as suggested in the first answer:
Config.load("config/taipy-config.toml")
# Get the pipeline and scenario configuration
pipeline_cfg = Config.pipelines['create_dataset']
scenario_cfg = Config.scenarios['scenario_configuration']
# Run the Core service
tp.Core().run()
# Creation of a pipeline and scenario based on the configuration
pipeline = tp.create_pipeline(pipeline_cfg)
scenario = tp.create_scenario(scenario_cfg)
And in my main file:
import config.config as config
pipeline = config.pipeline_cfg
dataset_fossil_fuels_gdp = pipeline.final_dataset
print(dataset_fossil_fuels_gdp.head())
.........
I get the following error:
print(dataset_fossil_fuels_gdp.head())
AttributeError: 'NoneType' object has no attribute 'head'
I do not really understand the behavior behind it, I thought data nodes were data structures (of different possible kinds, such as pandas dataframes) and tasks applied functions to the input structures and had other data structures as an output. This is what I am trying to do so far, before moving forward.
Upvotes: 0
Views: 358
Reputation: 1521
Pipelines and scenarios cannot be directly accessed.
When you load the configuration file (config/taipy-config.toml), your configuration objects are organized within the Config object.
Config.pipelines
: returns a dictionary containing all pipeline configurations.Config.scenarios
: returns a dictionary containing all scenario configurations.In your main.py, this code should work by itself:
from taipy import Config
impor taipy as tp
# Loading of the TOML
Config.load('config/taipy-config.toml')
# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']
# Run the Core service
tp.Core().run()
# Creation of a scenario based on the configuration
scenario = tp.create_scenario(scenario_cfg)
# Submission of the scenario (executes all the tasks)
tp.submit(scenario)
# Read the output Data node of your task that you called final_dataset
print(scenario.final_dataset.read())
Remember that Taipy Core was designed to build the backend for your web application. It will be increasingly useful as your application grows in complexity, requiring more advanced data processing, seamless integration with various services, and efficient handling of multiple scenarios. Scenarios represent different versions of your pipelines that store previous runs, results, and data, allowing for easy comparison.
I will go ahead and describe the best structure of your project based on what you have done so far.
def preprocess(dataset_fossil_fuels_gdp, country_codes):
print("merging datasets", dataset_fossil_fuels_gdp, country_codes)
dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp.merge(
country_codes[["alpha-3", "region"]],
how="left",
left_on="Code",
right_on="alpha-3",
)
dataset_fossil_fuels_gdp = dataset_fossil_fuels_gdp[
~dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"].isnull()
].reset_index()
dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] = (
dataset_fossil_fuels_gdp["Fossil fuels per capita (kWh)"] * 1000
)
return dataset_fossil_fuels_gdp
from taipy import Config
# Loading of the TOML
Config.load('config/taipy-config.toml')
# Get the scenario configuration
scenario_cfg = Config.scenarios['scenario_configuration']
You should have your two CSV files in this folder.
Here is an example of code to use your configuration.
from config.config import scenario_cfg
import taipy as tp
if __name__ == "__main__":
# Run the Core service
tp.Core().run()
# Creation of a scenario based on the configuration
scenario = tp.create_scenario(scenario_cfg)
tp.submit(scenario)
print(scenario.final_dataset.read())
Upvotes: 2
Reputation: 81
I assume in your app file with the code config.create_dataset
, you are trying to use the 'create_dataset' PipelineConfig to instantiate a new pipeline from it. However, here config
corresponds to your own module 'config.py', and you don't expose any attribute or variable named 'create_dataset'.
If you want to retrieve it, you must use the taipy Config object (with an upper case) as follows :
Config.pipelines["create_dataset"]
Here is the code that should work.
import taipy as tp
import config as config # Loads 'config.toml' into Taipy Config
from taipy import Config # Imports Taipy Config
pipeline = tp.create_pipeline(Config.pipelines["create_dataset"])
pipeline.name="Pipeline to load the dataset"
dataset_fossil_fuels_gdp = pipeline.final_dataset
Here is what I did:
import taipy as tp
from taipy import Config
PipelineConfig
named "create_dataset" from the Taipy Config using the pipeline dictionary: Config.pipelines["create_dataset"]
pipeline.name="Pipeline to load the dataset"
I hope that can help.
By the way, I noticed a few things not related to your issue:
config.py
, you don't need to import pandas import pandas as pd
scenario = tp.create_scenario(Config.scenarios["scenario_configuration"])
scenario.name="Scenario to load the dataset"
fossil_energy
and country_codes
are GLOBAL scoped. That means only one Datanode instance can be created. The output data node config is CYCLE scoped. That means you will have one data node instance per cycle. This looks suspicious to me. If the inputs don't change over cycles, I expect the output to not change either. The only case I can imagine is if your function preprocess
is not deterministic and depends on external resources.Upvotes: 2