Reputation: 37807

How to store a dataframe in sqlite3 with a uniqueid for each row

I'm working with a .csv dataset that I got from here.

I managed to write the code below to store the dataframe weather in a sql database/table but I don't know how to create a unique ID (eg. OID1, OID2, ...) in the database/table for each row.

CODE :

import sqlite3 as sql
import pandas as pd

weather = pd.read_csv('londonweather.csv', delimiter=',')
conn = sql.connect('weather.db')
weather.to_sql('weather', conn)

>>> `weather.sample(5)`

Also, whenever I update the dataframe (by adding some rows), these rows have to be inserted in the database/table with a unique ID (OIDx, with x as the row index in the table).

Is it possible to achieve this ? Do you have any propositions, please ?

Upvotes: 0

Answers (3)

Timeless

Reputation: 37807

After some digging and thanks to the SO posts/answers below, I managed to build the right code for my question :

1 - How to store the dataframe in the database :

import sqlite3 as sql
import pandas as pd

### --- READING THE CSV AS A DATAFRAME
df = pd.read_csv('londonweather.csv', delimiter=',')

### --- CREATING THE DATABASE + CURSOR
conn = sql.connect('weather_SO.db')
cur = conn.cursor()

### --- CREATING THE TABLE
cur.execute('''CREATE TABLE IF NOT EXISTS WEATHER_TABLE
           (unique_id TEXT PRIMARY KEY, Year INTEGER, Month INTEGER, Tmax REAL, Tmin REAL, Rain REAL, Sun REAL)''')

### --- COPYING THE DATAFRAME TO THE TABLE
df.to_sql('WEATHER_TABLE', conn, if_exists='append', index=False)

2 - How to create a unique ID (OIDx) :

### --- UPDATING THE VALUE OF THE UNIQUE ID
cur.execute("UPDATE WEATHER_TABLE set unique_id = 'OID' || rowid")

### --- SAVING CHANGES
conn.commit()

3 - How to insert multiple new elements/rows :

### --- INSERTING NEW ROWS THE TABLE
data_weather = [(2022, 7, 20.0, 20.0, 1.0, 2.0), (2022, 8, 19.3, 20.0, 2.0, 8.0)]
cur.executemany('INSERT INTO WEATHER_TABLE(Year, Month, Tmax, Tmin, Rain, Sun) VALUES (?,?,?,?,?,?)', data_weather)

### --- UPDATING THE UNIQUE ID OF THE NEW ROWS
conn.execute("UPDATE WEATHER_TABLE set unique_id = 'OID' || rowid")

### --- SAVING CHANGES AND CLOSING CONNECTION
conn.commit()
conn.close()

RESULT (in SQLiteStudio), showing the last five rows :

Upvotes: 0

itogaston

Reputation: 399

import sqlite3 as sql
import pandas as pd

weather = pd.read_csv("weather.csv", delimiter=",")
conn = sql.connect("weather.db")

weather["OID"] = list(map(lambda x: "OID" + str(int(x)), weather.index))

print(weather.head())

output:

   Year  Month  Tmax  Tmin  Rain    Sun   OID
0  1957      1   8.7   2.7  39.5   53.0  OID0
1  1957      2   9.0   2.9  69.8   64.9  OID1
2  1957      3  13.9   5.7  25.4   96.7  OID2
3  1957      4  14.2   5.2   5.7  169.6  OID3
4  1957      5  16.2   6.5  21.3  195.0  OID4

rest of the code:

schema = pd.io.json.build_table_schema(weather, primary_key="OID")
weather.to_sql("weather", conn, schema)

print(conn.execute("SELECT * FROM weather").fetchone())

output:

(0, 1957, 1, 8.7, 2.7, 39.5, 53.0, 'OID0')

If you want to add the OID to new rows execute this line again:

weather["OID"] = list(map(lambda x: "OID" + str(int(x)), weather.index))

Just make sure you don't change the index of any of the rows

Upvotes: 1

PChemGuy

Reputation: 1701

SQLite creates ROWID for you, unless you specifically instructed it not to do so.