Reputation: 161
I am creating a python script using pandas to read through a file which has multiple row values.
Once read, I need to build an array of these values and then assign it to a dataframe row value.
The code I have used is
import re
import numpy as np
import pandas as pd
master_data = pd.DataFrame()
temp_df = pd.DataFrame()
new_df = pd.DataFrame()
for f in data:
##Reading the file in pandas which is in excel format
#
file_df = pd.read_excel(f)
filename = file_df['Unnamed: 1'][2]
##Skipping first 24 rows to get the required reading values
column_names = ['start_time','xxx_value']
data_df = pd.read_excel(f, names=column_names, skiprows=25)
array =np.array([])
for i in data_df.iterrows():
array = np.append(array,i[1][1])
temp_df['xxx_value'] = [array]
temp_df['Filename'] = filename
temp_df['sub_id']=
temp_df['Filename'].str.split('_',1).str[1].str.strip()
temp_df['sen_site']=
temp_df['Filename'].str.split('_',1).str[0].str.strip()
temp_df['sampling_interval'] = 15
temp_df['start_time'] = data_df['start_time'][2]
new_df= new_df.append(xxx_df)
new_df.index = new_df.index + 1
new_df=new_df.sort_index()
new_df.index.name='record_id'
new_df = new_df.drop("Filename",1) ##dropping the Filename as it
is not needed to be loaded in postgresql
##Rearrange to postgresql format
column_new_df = new_df.columns.tolist()
column_new_df.
insert(4,column_new_df.pop(column_new_df.index('xxx_value')))
new_df = new_df.reindex(columns = column_new_df)
print(new_df)
This code is not working when I try to insert the array data into Postgresql.
It gives me an error stating:
ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'
Upvotes: 13
Views: 20607
Reputation: 1
from pgvector.psycopg2 import register_vector
db_connection = psycopg2.connect(dsn)
register_vector(db_connection)
You need to import register_vector so that it can work with numpy arrays. This is also needed if you're using PG Vector extension for PostgreSQL.
Upvotes: 0
Reputation: 41
We can address the issue by extracting one element at a time. Here I'm assuming for a dataframe temp_df
, sub_id
of type numpy.int64
, we can directly extract the values using the iloc
and item as temp_df.iloc[0]['sub_id'].item()
and we can push that in DB.
Upvotes: 0
Reputation: 2256
In my opinion, the most effective way would be to make psycopg2 always aware of np.ndarray(s). One could do that by registering an adapter:
import numpy as np
from psycopg2.extensions import register_adapter, AsIs
def addapt_numpy_array(numpy_array):
return AsIs(tuple(numpy_array))
register_adapter(np.ndarray, addapt_numpy_array)
To help working with numpy in general, my default addon to scripts/libraries dependent on psycopg2 is:
import numpy as np
from psycopg2.extensions import register_adapter, AsIs
def addapt_numpy_float64(numpy_float64):
return AsIs(numpy_float64)
def addapt_numpy_int64(numpy_int64):
return AsIs(numpy_int64)
def addapt_numpy_float32(numpy_float32):
return AsIs(numpy_float32)
def addapt_numpy_int32(numpy_int32):
return AsIs(numpy_int32)
def addapt_numpy_array(numpy_array):
return AsIs(tuple(numpy_array))
register_adapter(np.float64, addapt_numpy_float64)
register_adapter(np.int64, addapt_numpy_int64)
register_adapter(np.float32, addapt_numpy_float32)
register_adapter(np.int32, addapt_numpy_int32)
register_adapter(np.ndarray, addapt_numpy_array)
otherwise there would be some issues even with numerical types.
I got the adapter trick from this other stackoverflow entry.
Upvotes: 4
Reputation: 4629
Convert each numpy array element to its equivalent list using apply
and tolist
first, and then you should be able to write the data to Postgres:
df['column_name'] = df['column_name'].apply(lambda x: x.tolist())
Upvotes: 1
Reputation: 77
I am not sure where the problem is, as I can't see in your code the part where you insert the data into Postgres.
My guess though is that you are giving Postgres a Numpy array: psycopg2 can't handle Numpy data types, but it should be fairly easy to convert it to native Python types that work with psycopg2 (e.g. by using the .tolist(method), it is difficult to give more precise information without the code).
Upvotes: 3