Reputation: 49
Hi guys:) I'm a newbie at programming and would like to ask for help in creating a function to help reduce redundancy in my code. I have successfully created a database holding 5 different tables for data of different countries. All tables have the same structure (see attached screenshots for reference). My objective is to calculate the summation of all rows within all the different tables for a particular parameter (type of pollution). I have managed to write code to only select the particular data I need of one country (I tried writing code to calculate the summation but I can't figure that out, so I decided to just select the data and then manually calculate the values myself with a calculator -I know that sort of defeats the purpose of programming but at my programming level (beginner) I feel like it's the only way that I can do the code) my issue is that I have five countries, so I don't want to repeat the same block of code for the different countries. this is my code for one country:
def read_MaltaData():
conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
Malta = conn.cursor()
Malta.execute("SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'")
result = Malta.fetchall()
print(result)
my result is this:
[('Malta', 'Valletta', 'MT00005', 'Msida', 'PM10', 64.3, 'ug/m3', 'Traffic', 'urban', 14.489985999999998, 35.895835999489535, 2.0), ('Malta', None,
etc.
(I am going to manually calculate the data I require -in this case 64.3 + the value from the next row- as I don't know how to do it in python)
To clarify, my aim isn't to have a sum total across all the tables as one whole value (i.e. I don't want to add the values of all the countries all together). My desired output should look something like this:
Malta summation value
italy summation value
france summation value
and not like this
countries all together = one whole value (i.e. all summation values added together)
I would greatly appreciate any help I can get. Unfortunately I am not able to share the database with you, which is why I am sharing screenshots of it instead.
image of all 5 different tables in one database:
image of one table (all tables look the same, just with different values)
Upvotes: 0
Views: 256
Reputation: 10238
If you pass the country name as argument to the data retrieval function, you can generate the table names dynamically (note the f-string arguments in execute
and print
):
def print_CountryData(country):
conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
cur = conn.cursor()
cur.execute(f"SELECT SUM(AirPollutionLevel) FROM {country}Data WHERE AirPollutant = 'PM10'")
sumVal = cur.fetchone()[0]
print(f"{country} {sumVal}")
# example call:
for country in ('France', 'Germany', 'Italy', 'Malta', 'Poland'):
print_CountryData(country)
While building query strings your own with simple string functions is discouraged in the sqlite3 documentation for security reasons, in your very case where you have total control of the actual arguments I'd consider it as safe.
This answer adapts the summation from the great answer given by forpas but refuses to move the repetition to SQL. It also shows both integration with python and output formatting.
This is an improved version of my first answer, transformed into a Minimal, Reproducible Example and combined with output. Also, some performance improvements were made, for instance opening the database only once.
import sqlite3
import random # to simulate actual pollution values
# Countries we have data for
countries = ('France', 'Germany', 'Italy', 'Malta', 'Poland')
# There is one table for each country
def tableName(country):
return country+'Data'
# Generate minimal version of tables filled with random data
def setup_CountryData(cur):
for country in countries:
cur.execute(f'''CREATE TABLE {tableName(country)}
(AirPollutant text, AirPollutionLevel real)''')
for i in range(5):
cur.execute(f"""INSERT INTO {tableName(country)} VALUES
('PM10', {100*random.random()})""")
# Get sum up pollution data for each country
def print_CountryData(cur):
for country in countries:
cur.execute(f"""SELECT SUM(AirPollutionLevel) FROM
{tableName(country)} WHERE AirPollutant = 'PM10'""")
sumVal = cur.fetchone()[0]
print(f"{country:10} {sumVal:9.5f}")
# For testing, we use an in-memory database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
setup_CountryData(cur)
# The functionality actually required
print_CountryData(cur)
Sample output:
France 263.79430
Germany 245.20942
Italy 225.72068
Malta 167.72690
Poland 290.64190
It's often hard to evaluate a solution without actually trying it. That's the reason why questioners on StackOverflow are constantly encouraged to ask in this style: it makes it much more likely someone will understand and fix the problem ... quickly
Upvotes: 1
Reputation: 2260
If the database is not too big you could use pandas
.
This approach is less efficient than using SQL queries directly but can be used if you want to explore the data interactively in a notebook for example.
You can create a dataframe from your SQLite db using pandas.read_sql_query
and then perform your calculation using pandas.DataFrame
methods, which are designed for this type of tasks.
For your specific case:
import sqlite3
import pandas as pd
conn = sqlite3.connect(db_file)
query = "SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'"
df = pd.read_sql_query(query, conn)
# check dataframe content
print(df.head())
If I understood and then you want to compute the sum of the values in a given column:
s = df['AirPollutionLevel'].sum()
If you have missing values you might want to fill them with 0s before summing:
s = df['AirPollutionLevel'].fillna(0).sum()
Upvotes: 0
Reputation: 164089
You can use UNION ALL to get a row for each country:
SELECT 'France' country, SUM(AirPolutionLevel) [summation value] FROM FranceData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Germany' country, SUM(AirPolutionLevel) [summation value] FROM GermanyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Italy' country, SUM(AirPolutionLevel) [summation value] FROM ItalyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Malta' country, SUM(AirPolutionLevel) [summation value] FROM MaltaData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Poland' country, SUM(AirPolutionLevel) [summation value] FROM PolandData WHERE AirPollutant = 'PM10'
Upvotes: 2