Klausos Klausos
Klausos Klausos

Reputation: 16050

Incorrect number of rows returned by JOIN query in Python script

I wrote the following code in Python that connects to DB, creates two tables and joins them. Then it prints the result of the JOIN query.

The problem is that the number of rows is 3, though I expected to get 2 rows. Also, if I run the same query in a command prompt using sqlite>, then the number of rows returned by the JOIN is correct, i.e. 2.

import sqlite3 as lite
import pandas as pd

# Connecting to the database. The `connect()` method returns a connection object.
con = lite.connect('getting_started.db')

with con:
    cur = con.cursor()
    cur.execute("DROP TABLE IF EXISTS cities")
    cur.execute("DROP TABLE IF EXISTS weather")

    cur.execute("CREATE TABLE cities (name text, state text)")
    cur.execute("CREATE TABLE weather (city text, year integer, warm_month text, cold_month text, average_high integer)")

    # Filling 'cities' with the data
    cur.execute("INSERT INTO cities VALUES('Washington', 'DC')")
    cur.execute("INSERT INTO cities VALUES('Houston', 'TX')")

    # Filling 'weather' with the data
    cur.execute("INSERT INTO weather VALUES('Washington', 2013, 'July', 'January', 59)")
    cur.execute("INSERT INTO weather VALUES('Houston', 2013, 'July', 'January', 62)")

    # Joining data together
    sql = "SELECT name, state, year, warm_month, cold_month FROM cities " \
          "INNER JOIN weather " \
          "ON name = city"
    cur.execute(sql)

rows = cur.fetchall()
cols = [desc[0] for desc in cur.description]

# Loading data into pandas
df = pd.DataFrame(rows, columns=cols)

for index, row in df.iterrows():
    print("City: {0}, The warmest month: {1}".format(row['name'],row['warm_month']))

In Python the result is:

City: Washington, The warmest month: July
City: Washington, The warmest month: July
City: Houston, The warmest month: July

However, in command prompt the result is different (correct):

City: Washington, The warmest month: July
City: Houston, The warmest month: July

Upvotes: 0

Views: 67

Answers (1)

Tom Dalton
Tom Dalton

Reputation: 6190

The issue is that your rows = cur.fetchall() is outside the con connection context manager, so something odd is happening when you use the cursor and it's DB connection has been closed.

Ref the docs here: https://docs.python.org/2/library/sqlite3.html#using-the-connection-as-a-context-manager it suggests that the with con: provides a transaction, which might explain the strange behaviour executing the statement in the transaction but then trying to use a cursor on the result set outside of the transaction.

It still seems odd to me and I would have expected that this use would have caused sqllite3 to raise an exception telling you this was happening.

Upvotes: 2

Related Questions