Reputation: 9640
I have tried to catch the idea of “Module not found” when importing a Python package within a plpython3u procedure, I did not know how to use the accepted answer there to make it run on Linux. Normally, this here should be a duplicate, but if there is a difference with with MacOS in the used paths, it might be good for a new question.
I am on WSL2 (Ubuntu 20.04). I installed with the commands of the official PostgreSQL Downloads page from the PostgreSQL Apt Repository for Ubuntu.
# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update the package lists:
sudo apt-get update
# Install the latest version of PostgreSQL.
# If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
sudo apt-get -y install postgresql
This gave me PostgreSQL 13.
I installed plpython3u with commands from PostgreSQL: how to install plpythonu extension:
sudo apt-cache search ".*plpython3.*"
sudo apt-get install postgresql-contrib postgresql-plpython3-13
Because I am on WSL, I start the service with:
service postgresql start
Change role and run psql:
sudo su postgres
psql
Checking the Python version directly as a stored procedure in postgreSQL:
CREATE OR REPLACE FUNCTION return_version()
RETURNS VARCHAR
AS $$
import sys
return sys.version
$$ LANGUAGE plpython3u;
Output:
CREATE FUNCTION
Test:
postgres=# SELECT return_version();
return_version
------------------------------------------
3.8.10 (default, Jun 2 2021, 10:49:15) +
[GCC 9.4.0]
(1 row)
shows that for "PostgreSQL 13", "Python 3.8.10" gets installed.
And this is exactly the one to be found on Linux. If I just enter a Python version that I do not have, like 3.7
:
python3.7
I get the overview:
Command 'python3.7' not found, did you mean:
command 'python3.8' from deb python3.8 (3.8.10-0ubuntu1~20.04)
command 'python3.9' from deb python3.9 (3.9.5-3~20.04.1)
command 'python2.7' from deb python2.7 (2.7.18-1~20.04.1)
Confirming that there are no two parallel installations of Python 3.8
installed, it is clear that the Linux 3.8.10
is the used version for plpython3u in PostgreSQL.
I tried a function that imports packages, taken from Kmeans in PostgreSQL in a nutshell:
CREATE OR replace FUNCTION kmeans(input_table text, columns text[], clus_num int) RETURNS bytea AS
$$
from pandas import DataFrame
from sklearn.cluster import KMeans
from pickle import dumps
all_columns = ",".join(columns)
if all_columns == "":
all_columns = "*"
rv = plpy.execute('SELECT %s FROM %s;' % (all_columns, plpy.quote_ident(input_table)))
frame = []
for i in rv:
frame.append(i)
df = DataFrame(frame).astype('float')
kmeans = KMeans(n_clusters=clus_num, random_state=0).fit(df._get_numeric_data())
return dumps(kmeans)
$$ LANGUAGE plpython3u;
Output:
CREATE FUNCTION
Test throws the error:
postgres=# SELECT kmeans('my_table', ARRAY['col1', 'col2'],3);
ERROR: ModuleNotFoundError: No module named 'pandas'
CONTEXT: Traceback (most recent call last):
PL/Python function "kmeans", line 3, in <module>
from pandas import DataFrame
PL/Python function "kmeans"
Then, I installed pandas in Python 3.8
:
python3.8 -m pip install pandas
Collecting pandas
Downloading pandas-1.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.5 MB)
|████████████████████████████████| 11.5 MB 6.0 MB/s
Collecting python-dateutil>=2.7.3
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
|████████████████████████████████| 247 kB 4.7 MB/s
Collecting numpy>=1.17.3
Downloading numpy-1.21.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.8 MB)
|████████████████████████████████| 15.8 MB 131 kB/s
Collecting pytz>=2017.3
Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
|████████████████████████████████| 510 kB 5.7 MB/s
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7.3->pandas) (1.14.0)
Installing collected packages: python-dateutil, numpy, pytz, pandas
WARNING: The scripts f2py, f2py3 and f2py3.8 are installed in '/home/my_user/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed numpy-1.21.2 pandas-1.3.2 python-dateutil-2.8.2 pytz-2021.1
And pandas is now stored at:
python3.8 -m pip show pandas
Name: pandas
Version: 1.3.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /home/my_user/.local/lib/python3.8/site-packages
Requires: numpy, python-dateutil, pytz
Required-by:
But the test still throws the same error:
postgres=# SELECT kmeans('my_table', ARRAY['col1', 'col2'],3);
ERROR: ModuleNotFoundError: No module named 'pandas'
CONTEXT: Traceback (most recent call last):
PL/Python function "kmeans", line 3, in <module>
from pandas import DataFrame
PL/Python function "kmeans"
How to install pandas (or any other package I need) so that it is found by the PostgreSQL 13 plpython3u extension on Linux?
The spun-off question had another version test in it, but that was not showing anything from inside the extension, the commands are just the same as in the normal bash. Therefore, the value of the following is reduced to just a random test of the a plpython3u function. I add this only because you should never be 100 % sure. To me, it is clear that this does not say anything about the Python version that is actually used by plpython3u.
CREATE OR REPLACE FUNCTION get_py()
RETURNS VARCHAR
AS $$
import os
return os.popen('which python3').read()
$$ LANGUAGE plpython3u;
Output:
CREATE FUNCTION
Test:
SELECT get_py();
get_py
------------------
/usr/bin/python3+
(1 row)
And the same with 'which python3.8'
:
select get_py();
get_py
--------------------
/usr/bin/python3.8+
(1 row)
Perhaps the default version of Python must be changed somewhere before installation? This is just a very vague guess from an installation using make, with Python settings in /etc/make.conf, here is the code that is used there (question from 2018):
Changing /etc/make.conf
to
DEFAULT_VERSIONS+= python=3.6
It is said that this ran through and installed plpython3u (I have not tested this):
root@db% cd /usr/ports/databases/posgresql10-plpython
root@db% make fetch
root@db% make extract
root@db% cd /usr/ports/databases/postgresql10-plpython/work/postgresql-10.5
root@db% ./configure PYTHON=/usr/local/bin/python3 --with-python --with-libraries=/usr/local/lib --with-includes=/usr/local/include/
root@db% cd /usr/ports/databases/posgresql10-plpython
root@db% make install
Upvotes: 2
Views: 2031
Reputation: 19742
The issue is that this:
python3.8 -m pip install pandas
installs a package(pandas in this case) to the site-packages
in the home directory of the user running the command, so:
/home/my_user/.local/lib/python3.8/site-packages
The plpython3u
extension running in Postgres is looking for the package in the system wide site-packages
. To get the package there you need to do:
sudo python3.8 -m pip install pandas
Upvotes: 3