Liorb
Liorb

Reputation: 21

Difference between cursor.fetchall() and pandas Dataframe

I know there are a few ways to retrieve data from RDB table. One with pandas as read_sql, the other with cursore.fetchall(). What are the main differences between both ways in terms of:

  1. memory usage - is df less reccomended?
  2. performance - selecting data from a table (e.g. large set of data)
  3. performace - inserting data with a loop for cursor vs df.to_sql.

Thanks!

Upvotes: 2

Views: 6371

Answers (1)

Konstantinos
Konstantinos

Reputation: 4386

That's an interesting question. For a ~10GB SQLite database, I get the following results for your second question. pandas.sql_query seems comparable to speed with the cursor.fetchall.

The rest I leave as an exercise. :D

import sqlite3
import time
import pandas as pd

def db_operation(db_name):
    connection = sqlite3.connect(db_name)
    c = connection.cursor()
    yield c
    connection.commit()
    connection.close()

start = time.perf_counter()
for i in range(0, 10):
    with db_operation('components.db') as c:
            c.execute('''SELECT * FROM hashes WHERE (weight > 50 AND weight < 100)''')
            fetchall = c.fetchall()

time.perf_counter() - start
2.967 # fractional seconds


start = time.perf_counter()
for i in range(0, 10):
    connection = sqlite3.connect('components.db')
    sql_query = pd.read_sql_query('''SELECT * FROM hashes WHERE (weight > 50 AND weight < 100)''', con = connection)
    connection.commit()
    connection.close()

time.perf_counter() - start
2.983 # fractional seconds

The difference is that cursor.fetchall() is a bit more spartan (=plain). pandas.read_sql_query returns a <class 'pandas.core.frame.DataFrame'> and so you can use all the methods of pandas.DataFrame, like pandas.DataFrame.to_latex, pandas.DataFrame.to_csv pandas.DataFrame.to_excel, etc. (documentation link)

One can accomplish the same exact goals with cursor.fetchall, but needs to press some or a lot extra keys.

Upvotes: 3

Related Questions