Reputation: 515
In this project, I am trying to utilize the pycaret
package to analyze some time series with the help of scikit-learn package. Specifically, I have imported some modules as follows:
from pycaret.regression import (setup, compare_models, predict_model, plot_model, finalize_model, load_model)
# setting up the stage to initialize the training environment
s = setup(
data=train,
target=target_var,
ignore_features = ['Series'],
numeric_features=involved_numerics,
categorical_features = categorics,
silent=True,
log_experiment=True,
)
# Now, to train machine learning models, we need to compare models and find the best one
best_model = compare_models(sort='MAE')
# Making some plots
for id, name in zip(ids, names):
plot_model(best_model, plot=id, scale=3, save=True)
.
.
.
I was able to succeed in running the code for some of the models but not all from the list of available models mentioned in the documentation. However, for some specific models (such as Recursive Feat. Selection), there is an error message:
Traceback (most recent call last):
File "c:/Users/username/Desktop/project/project.py", line 55,
in <module>
main()
File "c:/Users/username/Desktop/project/project.py", line 48,
in main
ml_modelling(data, train, test)
File "c:\Users\username\Desktop\project\utilities.py", line 1070, in ml_modelling
plot_model(best_model, plot=id, scale=3, save=True)
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\regression.py", line 1601, in plot_model
return pycaret.internal.tabular.plot_model(
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 7712, in plot_model
ret = locals()[plot]()
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 6293, in residuals_interactive
resplots.write_html(plot_filename)
File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\plots\residual_plots.py", line 673, in write_html
f.write(html)
File "C:\Users\username\anaconda3\envs\py38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>
Here is the train:
Train
Series x y z ID var1 var2 var3 var4 var5 var6
0 1 2 1 3 True -3 -4 6 7 4 6
1 2 2 1 7 False 22 0 3 5 2 8
2 3 2 1 0 True 3 -6 3 5 4 4
3 4 2 1 4 False 27 -4 8 3 -3 2
.
.
.
I am using VSCode to run my python tool on a Windows 10 machine and here is the list of all packages installed on the conda environment:
name: py38
channels:
- conda-forge
- defaults
dependencies:
- bzip2=1.0.8=h8ffe710_4
- ca-certificates=2022.12.7=h5b45459_0
- et_xmlfile=1.1.0=pyhd8ed1ab_0
- libffi=3.4.2=h8ffe710_5
- libsqlite=3.40.0=hcfcfb64_0
- libzlib=1.2.13=hcfcfb64_4
- openpyxl=3.0.10=py38h91455d4_2
- openssl=3.0.7=hcfcfb64_2
- pip=22.3.1=pyhd8ed1ab_0
- python=3.8.15=h4de0772_1_cpython
- python_abi=3.8=3_cp38
- setuptools=66.1.1=pyhd8ed1ab_0
- tk=8.6.12=h8ffe710_0
- ucrt=10.0.22621.0=h57928b3_0
- vc=14.3=hb6edc58_10
- vs2015_runtime=14.34.31931=h4c5c07a_10
- wheel=0.38.4=pyhd8ed1ab_0
- xz=5.2.6=h8d14728_0
- pip:
- alembic==1.9.2
- asttokens==2.2.1
- attrs==22.2.0
- backcall==0.2.0
- blis==0.7.9
- boruta==0.3
- catalogue==1.0.2
- certifi==2022.12.7
- charset-normalizer==3.0.1
- click==8.1.3
- cloudpickle==2.2.1
- colorama==0.4.6
- colorlover==0.3.0
- comm==0.1.2
- contourpy==1.0.7
- cufflinks==0.17.3
- cycler==0.11.0
- cymem==2.0.7
- cython==0.29.14
- databricks-cli==0.17.4
- debugpy==1.6.6
- decorator==5.1.1
- docker==6.0.1
- entrypoints==0.4
- executing==1.2.0
- flask==2.2.2
- fonttools==4.38.0
- funcy==1.18
- future==0.18.3
- gensim==3.8.3
- gitdb==4.0.10
- gitpython==3.1.30
- greenlet==2.0.2
- htmlmin==0.1.12
- idna==3.4
- imagehash==4.3.1
- imbalanced-learn==0.7.0
- importlib-metadata==5.2.0
- importlib-resources==5.10.2
- ipykernel==6.20.2
- ipython==8.9.0
- ipywidgets==8.0.4
- itsdangerous==2.1.2
- jedi==0.18.2
- jinja2==3.1.2
- joblib==1.2.0
- jupyter-client==8.0.1
- jupyter-core==5.1.5
- jupyterlab-widgets==3.0.5
- kiwisolver==1.4.4
- kmodes==0.12.2
- lightgbm==3.3.5
- llvmlite==0.37.0
- mako==1.2.4
- markdown==3.4.1
- markupsafe==2.1.2
- matplotlib==3.6.3
- matplotlib-inline==0.1.6
- mlflow==2.1.1
- mlxtend==0.19.0
- multimethod==1.9.1
- murmurhash==1.0.9
- nest-asyncio==1.5.6
- networkx==3.0
- nltk==3.8.1
- numba==0.54.1
- numexpr==2.8.4
- numpy==1.20.3
- oauthlib==3.2.2
- packaging==22.0
- pandas==1.5.3
- pandas-profiling==3.6.3
- parso==0.8.3
- patsy==0.5.3
- phik==0.12.3
- pickleshare==0.7.5
- pillow==9.4.0
- plac==1.1.3
- platformdirs==2.6.2
- plotly==5.13.0
- preshed==3.0.8
- prompt-toolkit==3.0.36
- protobuf==4.21.12
- psutil==5.9.4
- pure-eval==0.2.2
- pyarrow==10.0.1
- pycaret==2.3.10
- pydantic==1.10.4
- pygments==2.14.0
- pyjwt==2.6.0
- pyldavis==3.3.1
- pynndescent==0.5.8
- pyod==1.0.7
- pyparsing==3.0.9
- python-dateutil==2.8.2
- pytz==2022.7.1
- pywavelets==1.4.1
- pywin32==305
- pyyaml==5.4.1
- pyzmq==25.0.0
- querystring-parser==1.2.4
- regex==2022.10.31
- requests==2.28.2
- scikit-learn==0.23.2
- scikit-plot==0.3.7
- scipy==1.5.4
- seaborn==0.12.2
- shap==0.41.0
- six==1.16.0
- sklearn==0.0.post1
- slicer==0.0.7
- smart-open==6.3.0
- smmap==5.0.0
- spacy==2.3.9
- sqlalchemy==1.4.46
- sqlparse==0.4.3
- srsly==1.0.6
- stack-data==0.6.2
- statsmodels==0.13.5
- tabulate==0.9.0
- tangled-up-in-unicode==0.2.0
- tenacity==8.1.0
- textblob==0.17.1
- thinc==7.4.6
- threadpoolctl==3.1.0
- tornado==6.2
- tqdm==4.64.1
- traitlets==5.8.1
- typeguard==2.13.3
- typing-extensions==4.4.0
- umap-learn==0.5.3
- urllib3==1.26.14
- visions==0.7.5
- waitress==2.1.2
- wasabi==0.10.1
- wcwidth==0.2.6
- websocket-client==1.5.0
- werkzeug==2.2.2
- widgetsnbextension==4.0.5
- wordcloud==1.8.2.2
- yellowbrick==1.2.1
- zipp==3.12.0
prefix: C:\Users\username\anaconda3\envs\py38
Upvotes: 0
Views: 1472
Reputation: 1160
It could be probably an issue in the library and the data being loaded having dash in unicode ...
Here is referenced pycaret's source code:
def write_html(self, plot_filename):
"""
Write the current plots to a file in HTML format.
Parameters
----------
plot_filename: str
name of the file
"""
html = self.get_html()
with open(plot_filename, "w") as f:
f.write(html)
And as mentioned in this stackoverflow question It could be solved by mentioning encoding while opening the file
with open(plot_filename, "w", encoding='utf-8') as f:
f.write(html)
But since you cannot change library's code try running following in console before running your script as mentioned in this answer
chcp 65001
set PYTHONIOENCODING=utf-8
Upvotes: 1