Dee
Dee

Reputation: 13

How to merge two SQLite tables in python using UNION statement (currently getting "ValueError: parameters are of unsupported type" error message)

I am quite new to both python and sqlite, so please bear with me, as my approach is certainly quite bootstrapped! I am trying to append multiple csv files to an existing table within an sqlite3 database using python 3.6. The code I have written combines the individual csv files into a single pandas data frame, which I then clean up by adding/combining/deleting columns to make it match the columns within the sqlite data frame. It then exports the new data frame as a csv file. I have managed to add this new csv file to the existing database, creating a new table in the database. What I want to do is add the data from the new table to an existing table within the database, so I tried to use a UNION statement, but it returns the following error "ValueError: parameters are of unsupported type". I know that when I look at the new table I created in the database, some of the columns are type "REAL" instead of text (despite converting them all to 'str' before exporting the csv), while all the columns in the table I want to join together using UNION are type "TEXT", so I suspect either this or the UNION statement itself is the problem, but I am not sure which and not sure how to fix either. Any help is greatly appreciated!!

import sqlite3 import os import pandas as pd import numpy as np

def add_CO2_files_to_database (files=None):

# If a list of filepaths isn't specified, use every csv file in the same
# directory as the script
if files is None:

    # Get the current directory
    cwd = os.getcwd()

    #Add every csv file that starts with 'FD_' to a list
    files = [cwd+'\\'+f for f in os.listdir(cwd) if f.startswith('FD_')]

#Merge the csv files above into single pandas data frame and add a column 
for file in files:

    df = pd.concat([pd.read_csv(fp).assign(file_name=os.path.basename(fp).split('.')[0]) for fp in files])

    #Create a new column called 'FD_serial' from the 'file_name' column 
    #that consists of just the FD serial number
    df['FD_serial'] = df['file_name'].str[0:11]

    #Create another column that combines the 'Day', 'Month', and 'Year' 
    #columns into 1 column called 'date'
    df['date'] = df['Month'].map(str)+'-'+df['Day'].map(str)+'-'+df['Year'].map(str)  

    #Combine columns 'date' and 'Time' into a column called 'date_time'
    #then convert column to datetime format
    df['date_time'] = pd.to_datetime(df['date'] + ' '+ df['Time'])

    #Create new column called 'id' that combines the FD serial number 
    #'FD_serial' and the timestamp 'date_time' so each line of data has a 
    #unique identifier in the database
    df['id'] = df['FD_serial'].map(str) + '_' + df['date'].map(str) + '_' + df['Time'].map(str)

    #Add column 'location' and populate with 'NaN'
    df['location'] = np.nan

    #Delete unneccesary columns: 'Month', 'Day', 'Year', 'Time', 'date', 'file_name'
    df = df.drop(["Month", "Day", "Year", "Time", "date", "file_name", "Mode"], axis=1)

    #Rename columns to match the SQLite database conventions
    df = df.rename({'Flux':'CO2_flux', 'Temperature (C)':'temp', 'CO2 Soil (ppm) ':'soil_CO2', 'CO2 Soil STD (ppm)':'soil_STD',
               'CO2 ATM (ppm)':'atm_CO2', 'CO2 ATM STD (ppm)':'atm_std'}, axis='columns')

    #Change data types of all columns to 'str' so it matches the data type in the database
    df = df.astype(str)

    #Save the merged data frame in a csv file called 'FD_CO2_data.csv'
    df.to_csv("FD_CO2_data.csv", index=False)

THE SECTION OF CODE BELOW ADDS THE CSV FILE CREATED ABOVE TO THE DATABASE

#Connect to the SQLite Database and create a cursor   
conn = sqlite3.connect("email_TEST.db")
cur = conn.cursor()

#Read in the csv file 'FD_CO2_data.csv' that was created above
df = pd.read_csv('FD_CO2_data.csv')

#Add the csv file to the database as a new table
df.to_sql('FD_CO2', conn, if_exists='append', index=False)

#df_db = pd.read_sql_query("select * from FD_CO2 limit 5;", conn)
cur.execute("SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location  FROM CO2 UNION SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location FROM FD_CO2", conn)

print(df_db)

add_CO2_files_to_database()

Upvotes: 1

Views: 989

Answers (1)

DinoCoderSaurus
DinoCoderSaurus

Reputation: 6520

Inserting the rows from the new table into the existing table should be as easy as

cur.execute("INSERT into CO2 select * from FD_CO2")

This assumes that the columns in FD_CO2 map directly to the columns in CO2 and there will not be an insert conflicts, eg duplicate key. You will need a cur.commit() to commit the rows to the database

A UNION in sqlite is a compound query, it is essentially the same as a union in mathematics; it returns the UNION of two "sets" i.e. selects.

The error "ValueError: parameters are of unsupported type" is because of the conn argument to the execute. An execute takes arguments when the sql statement is parametrized, ie takes arguments. Here's the python doc on the subject.

Upvotes: 1

Related Questions