Error in the MS Azure autoML preparation - wrong file format / encoding?

Question

I am trying to deploy the MS Azure automated machine learning as per the following Github example:

https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/classification-bank-marketing

I changed the code there to feed it with my data, but I am getting the following error when executing the autoML run:

automl.client.core.common.exceptions.DataprepException: Could not execute the specified transform.

coming from the:
  File "/azureml-envs/azureml_e9e27206cd19de471f4e5c7a1171037e/lib/python3.6/site-packages/azureml/automl/core/dataprep_utilities.py", line 50, in try_retrieve_pandas_dataframe_adb

Now, I thought there is sth. wrong with my data, but then I performed the following experiment with the original csv file:

1-st execution as in the Github example, building the dataflow directly based on the http link 2-nd execution building the dataflow based on the same csv, but downloaded to my share.

In the second case I got the same error as with my data. This would mean, that the Azure autoML run / dataflow / preparation process accepts only specific file format, which got changed when saving to my drive. I am not sure if this is about encoding or anything else. Could you please advice?

########################################
#Case 1, Error returned

data= "\\dwdf219\...\bankmarketing_train.csv"
dflow = dprep.auto_read_file(data)
dflow.get_profile()
X_train = dflow.drop_columns(columns=['y'])
y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)
dflow.head()

# Train
automl_settings = {
    "iteration_timeout_minutes": 10,
    "iterations": 5,
    "n_cross_validations": 2,
    "primary_metric": 'AUC_weighted',
    "preprocess": True,
    "max_concurrent_iterations": 5,
    "verbosity": logging.INFO,
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             path = project_folder,
                             run_configuration=conda_run_config,
                             X = X_train,
                             y = y_train,
                             **automl_settings
                            )     

remote_run = experiment.submit(automl_config, show_output = True)


########################################
#Case 2, all works fine

data = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv"
dflow = dprep.auto_read_file(data)
dflow.get_profile()
X_train = dflow.drop_columns(columns=['y'])
y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)
dflow.head()

# Train ...
###################################

Error in the MS Azure autoML preparation - wrong file format / encoding?

Answers (1)

Related Questions