Bartek Lachowicz
Bartek Lachowicz

Reputation: 189

How to use MlflowClient.log_inputs with pandas dataframe when logging inputs?

Question that needs answering:

Can I replaceold_func that uses mlflow.log_input with MlfowClient.log_inputs without having to call private function _to_mlflow_entity() on the dataset.

Information

I wrote some helper for mlflow and used the mlflow.[stuff] style. Now I changed to MlflowClient.[stuff] way of using the MlFlow.

I was porting the code from below

def old_func(x: pd.DataFrame, x_name :str="train_x"):
        # Log metadata of the datasets
        mlflow.log_input(pandas_dataset.from_pandas(x, name=x_name))
        # Save data
        mlflow.log_table(data=x, artifact_file=f"{x_name}.json")

The only way I managed to make it work is by using the private ._to_mlflow_entity() on the dataset since mlflow.log_inputs() accepts mlflow.data.dataset.Dataset and MlflowClient.log_inputs() accepts mlflow.entities.Dataset

def new_func(self, x: pd.DataFrame, x_name :str="train_x")
        dataset_x = pandas_dataset.from_pandas(x, name=x_name)._to_mlflow_entity()
        input_dataset_x = DatasetInput(dataset=dataset_x)

        self.client.log_inputs(
            self.run.info.run_id,
            [input_dataset_x]
        )
        self.client.log_table(run_id=self.run_id, data=x, artifact_file=f"{x_name}.json")

Upvotes: 0

Views: 91

Answers (0)

Related Questions