function to reduce redundancy for reading database in sqlite3

Hi guys:) I'm a newbie at programming and would like to ask for help in creating a function to help reduce redundancy in my code. I have successfully created a database holding 5 different tables for data of different countries. All tables have the same structure (see attached screenshots for reference). My objective is to calculate the summation of all rows within all the different tables for a particular parameter (type of pollution). I have managed to write code to only select the particular data I need of one country (I tried writing code to calculate the summation but I can't figure that out, so I decided to just select the data and then manually calculate the values myself with a calculator -I know that sort of defeats the purpose of programming but at my programming level (beginner) I feel like it's the only way that I can do the code) my issue is that I have five countries, so I don't want to repeat the same block of code for the different countries. this is my code for one country:

def read_MaltaData():
    conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
    Malta = conn.cursor()
    Malta.execute("SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'")
    result = Malta.fetchall()
    print(result)

my result is this:

[('Malta', 'Valletta', 'MT00005', 'Msida', 'PM10', 64.3, 'ug/m3', 'Traffic', 'urban', 14.489985999999998, 35.895835999489535, 2.0), ('Malta', None, etc.

(I am going to manually calculate the data I require -in this case 64.3 + the value from the next row- as I don't know how to do it in python)

To clarify, my aim isn't to have a sum total across all the tables as one whole value (i.e. I don't want to add the values of all the countries all together). My desired output should look something like this:

Malta    summation value
italy    summation value
france   summation value

and not like this

countries all together = one whole value (i.e. all summation values added together)

I would greatly appreciate any help I can get. Unfortunately I am not able to share the database with you, which is why I am sharing screenshots of it instead.

image of all 5 different tables in one database: enter image description here

image of one table (all tables look the same, just with different values) enter image description here

Upvotes: 0

Views: 256

Answers (3)

Wolf
Wolf

Reputation: 10238

If you pass the country name as argument to the data retrieval function, you can generate the table names dynamically (note the f-string arguments in execute and print):

First draft

def print_CountryData(country):
    conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
    cur = conn.cursor()
    cur.execute(f"SELECT SUM(AirPollutionLevel) FROM {country}Data WHERE AirPollutant = 'PM10'")
    sumVal = cur.fetchone()[0]
    print(f"{country} {sumVal}")

# example call:
for country in ('France', 'Germany', 'Italy', 'Malta', 'Poland'):
    print_CountryData(country)

While building query strings your own with simple string functions is discouraged in the sqlite3 documentation for security reasons, in your very case where you have total control of the actual arguments I'd consider it as safe.

This answer adapts the summation from the great answer given by forpas but refuses to move the repetition to SQL. It also shows both integration with python and output formatting.

MRE-style version

This is an improved version of my first answer, transformed into a Minimal, Reproducible Example and combined with output. Also, some performance improvements were made, for instance opening the database only once.

import sqlite3
import random # to simulate actual pollution values

# Countries we have data for
countries = ('France', 'Germany', 'Italy', 'Malta', 'Poland')

# There is one table for each country
def tableName(country):
    return country+'Data'

# Generate minimal version of tables filled with random data
def setup_CountryData(cur):
    for country in countries:
        cur.execute(f'''CREATE TABLE {tableName(country)}
                (AirPollutant text, AirPollutionLevel real)''')
        for i in range(5):
            cur.execute(f"""INSERT INTO {tableName(country)} VALUES 
                    ('PM10', {100*random.random()})""")
                    
# Get sum up pollution data for each country
def print_CountryData(cur):
    for country in countries:
        cur.execute(f"""SELECT SUM(AirPollutionLevel) FROM 
                {tableName(country)} WHERE AirPollutant = 'PM10'""")
        sumVal = cur.fetchone()[0]
        print(f"{country:10} {sumVal:9.5f}")

# For testing, we use an in-memory database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
setup_CountryData(cur)

# The functionality actually required
print_CountryData(cur)

Sample output:

France     263.79430
Germany    245.20942
Italy      225.72068
Malta      167.72690
Poland     290.64190

It's often hard to evaluate a solution without actually trying it. That's the reason why questioners on StackOverflow are constantly encouraged to ask in this style: it makes it much more likely someone will understand and fix the problem ... quickly

Upvotes: 1

dzang
dzang

Reputation: 2260

If the database is not too big you could use pandas.

This approach is less efficient than using SQL queries directly but can be used if you want to explore the data interactively in a notebook for example.

You can create a dataframe from your SQLite db using pandas.read_sql_query

and then perform your calculation using pandas.DataFrame methods, which are designed for this type of tasks.

For your specific case:

import sqlite3
import pandas as pd

conn = sqlite3.connect(db_file)

query = "SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'"
df = pd.read_sql_query(query, conn)

# check dataframe content
print(df.head())

If I understood and then you want to compute the sum of the values in a given column:

s = df['AirPollutionLevel'].sum()

If you have missing values you might want to fill them with 0s before summing:

s = df['AirPollutionLevel'].fillna(0).sum()

Upvotes: 0

forpas
forpas

Reputation: 164089

You can use UNION ALL to get a row for each country:

SELECT 'France' country, SUM(AirPolutionLevel) [summation value] FROM FranceData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Germany' country, SUM(AirPolutionLevel) [summation value] FROM GermanyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Italy' country, SUM(AirPolutionLevel) [summation value] FROM ItalyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Malta' country, SUM(AirPolutionLevel) [summation value] FROM MaltaData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Poland' country, SUM(AirPolutionLevel) [summation value] FROM PolandData WHERE AirPollutant = 'PM10'

Upvotes: 2

Related Questions