Reputation: 13
I am quite new to both python and sqlite, so please bear with me, as my approach is certainly quite bootstrapped! I am trying to append multiple csv files to an existing table within an sqlite3 database using python 3.6. The code I have written combines the individual csv files into a single pandas data frame, which I then clean up by adding/combining/deleting columns to make it match the columns within the sqlite data frame. It then exports the new data frame as a csv file. I have managed to add this new csv file to the existing database, creating a new table in the database. What I want to do is add the data from the new table to an existing table within the database, so I tried to use a UNION statement, but it returns the following error "ValueError: parameters are of unsupported type". I know that when I look at the new table I created in the database, some of the columns are type "REAL" instead of text (despite converting them all to 'str' before exporting the csv), while all the columns in the table I want to join together using UNION are type "TEXT", so I suspect either this or the UNION statement itself is the problem, but I am not sure which and not sure how to fix either. Any help is greatly appreciated!!
import sqlite3 import os import pandas as pd import numpy as np
def add_CO2_files_to_database (files=None):
# If a list of filepaths isn't specified, use every csv file in the same
# directory as the script
if files is None:
# Get the current directory
cwd = os.getcwd()
#Add every csv file that starts with 'FD_' to a list
files = [cwd+'\\'+f for f in os.listdir(cwd) if f.startswith('FD_')]
#Merge the csv files above into single pandas data frame and add a column
for file in files:
df = pd.concat([pd.read_csv(fp).assign(file_name=os.path.basename(fp).split('.')[0]) for fp in files])
#Create a new column called 'FD_serial' from the 'file_name' column
#that consists of just the FD serial number
df['FD_serial'] = df['file_name'].str[0:11]
#Create another column that combines the 'Day', 'Month', and 'Year'
#columns into 1 column called 'date'
df['date'] = df['Month'].map(str)+'-'+df['Day'].map(str)+'-'+df['Year'].map(str)
#Combine columns 'date' and 'Time' into a column called 'date_time'
#then convert column to datetime format
df['date_time'] = pd.to_datetime(df['date'] + ' '+ df['Time'])
#Create new column called 'id' that combines the FD serial number
#'FD_serial' and the timestamp 'date_time' so each line of data has a
#unique identifier in the database
df['id'] = df['FD_serial'].map(str) + '_' + df['date'].map(str) + '_' + df['Time'].map(str)
#Add column 'location' and populate with 'NaN'
df['location'] = np.nan
#Delete unneccesary columns: 'Month', 'Day', 'Year', 'Time', 'date', 'file_name'
df = df.drop(["Month", "Day", "Year", "Time", "date", "file_name", "Mode"], axis=1)
#Rename columns to match the SQLite database conventions
df = df.rename({'Flux':'CO2_flux', 'Temperature (C)':'temp', 'CO2 Soil (ppm) ':'soil_CO2', 'CO2 Soil STD (ppm)':'soil_STD',
'CO2 ATM (ppm)':'atm_CO2', 'CO2 ATM STD (ppm)':'atm_std'}, axis='columns')
#Change data types of all columns to 'str' so it matches the data type in the database
df = df.astype(str)
#Save the merged data frame in a csv file called 'FD_CO2_data.csv'
df.to_csv("FD_CO2_data.csv", index=False)
#Connect to the SQLite Database and create a cursor
conn = sqlite3.connect("email_TEST.db")
cur = conn.cursor()
#Read in the csv file 'FD_CO2_data.csv' that was created above
df = pd.read_csv('FD_CO2_data.csv')
#Add the csv file to the database as a new table
df.to_sql('FD_CO2', conn, if_exists='append', index=False)
#df_db = pd.read_sql_query("select * from FD_CO2 limit 5;", conn)
cur.execute("SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location FROM CO2 UNION SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location FROM FD_CO2", conn)
add_CO2_files_to_database()
Upvotes: 1
Views: 989
Reputation: 6520
Inserting the rows from the new table into the existing table should be as easy as
cur.execute("INSERT into CO2 select * from FD_CO2")
This assumes that the columns in FD_CO2 map directly to the columns in CO2 and there will not be an insert conflicts, eg duplicate key. You will need a cur.commit()
to commit the rows to the database
A UNION
in sqlite is a compound query, it is essentially the same as a union in mathematics; it returns the UNION of two "sets" i.e. selects.
The error "ValueError: parameters are of unsupported type"
is because of the conn
argument to the execute
. An execute takes arguments when the sql statement is parametrized, ie takes arguments. Here's the python doc on the subject.
Upvotes: 1