Xin
Xin

Reputation: 674

How to use multiple inputs in function and append results?

Below is my function to grab the weather data for a specific city and year, but I was wondering how would I modify the function so I could grab more than one city and more than one year at a time and append it to the same dataset?

import pandas as pd

def calgary_weather(city, year):
    d = dict(calgary = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27211&Year="+year+"&Month=5&Day=1&timeframe=2&submit=Download+Data",
             edmonton = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27793&Year="+year+"&Month=5&Day=1&timeframe=2&submit=Download+Data")
    url = d[city]
    data = pd.read_csv(url)
    return data

calgary_weather('edmonton', '2017')

Upvotes: 0

Views: 105

Answers (3)

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • There shouldn't be more than one url in the function, because it's not efficient coding. Just think if you had 30+ locations.
  • Use a dictionary with station and station id
    • stations can have as many key-value pairs as you like
  • The function will take a list of city names and a list of years
  • url will use f-strings to insert stationID and Year values
  • Create a list of dataframes for city and year
  • Concat them together
  • A dataframe will be returned
  • Call the function with a list of only the desired cities and years.
  • The function also uses type hints for clarity
    • List[str]: list of strings
    • List[int]: list of integers
import pandas as pd
from typing import List

def get_weather_data(cities: List[str], years: List[int]) -> pd.DataFrame:
    stations = {'calgary': 27211, 'edmonton': 27793}
    df_list = list()
    for city in cities:
        for year in years:
            url = f'https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID={stations[city]}&Year={year}&Month=5&Day=1&timeframe=2&submit=Download+Data'
            df_list.append(pd.read_csv(url))

    return pd.concat(df_list)


cities = ['calgary', 'edmonton']
years = [2017, 2018, 2019]

df = get_weather_data(cities=cities, years=years)

Top 3 rows and first 5 columns

print(df.iloc[:3, :5].to_markdown())

|    |   Longitude (x) |   Latitude (y) | Station Name     |   Climate ID | Date/Time   |
|---:|----------------:|---------------:|:-----------------|-------------:|:------------|
|  0 |            -114 |          51.11 | CALGARY INT'L CS |      3031094 | 2017-01-01  |
|  1 |            -114 |          51.11 | CALGARY INT'L CS |      3031094 | 2017-01-02  |
|  2 |            -114 |          51.11 | CALGARY INT'L CS |      3031094 | 2017-01-03  |

Bottom 3 rows and first 5 columns

print(df.iloc[-3:, :5].to_markdown())

|     |   Longitude (x) |   Latitude (y) | Station Name              |   Climate ID | Date/Time   |
|----:|----------------:|---------------:|:--------------------------|-------------:|:------------|
| 362 |         -113.61 |          53.31 | EDMONTON INTERNATIONAL CS |      3012206 | 2019-12-29  |
| 363 |         -113.61 |          53.31 | EDMONTON INTERNATIONAL CS |      3012206 | 2019-12-30  |
| 364 |         -113.61 |          53.31 | EDMONTON INTERNATIONAL CS |      3012206 | 2019-12-31  |

Upvotes: 4

Naphat Amundsen
Naphat Amundsen

Reputation: 1623

You could do something like the code below. Altough, the cities will "share" the years. I don't know how efficient df.append is, but it works. Note how i changed the strings in the dictionary.

import pandas as pd

def calgary_weather(cities: (list, str), years: (list, str)) -> pd.DataFrame:
    d = dict(calgary = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27211&Year={year}&Month=5&Day=1&timeframe=2&submit=Download+Data",
             edmonton = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27793&Year={year}&Month=5&Day=1&timeframe=2&submit=Download+Data")

    df = pd.DataFrame()

    for city in cities:
        for year in years:
            url = d[city].format(year=year)
            df = df.append(pd.read_csv(url))

    return df

calgary_weather(['edmonton','calgary'], ['2016','2017','2018'])

But if you want to be able to pick the years more flexibly (although the years have to be a list of lists):

def calgary_weather2(cities: (list, str), years: (list, list, str)) -> pd.DataFrame:
    d = dict(calgary = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27211&Year={year}&Month=5&Day=1&timeframe=2&submit=Download+Data",
             edmonton = "https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=27793&Year={year}&Month=5&Day=1&timeframe=2&submit=Download+Data")

    df = pd.DataFrame()

    for city, cityyears in zip(cities, years):
        for year in cityyears:
            url = d[city].format(year=year)
            df = df.append(pd.read_csv(url))

    return df

calgary_weather2(['edmonton','calgary'], [['2016','2017','2018'],['2012','2013']])

Upvotes: 2

Sam
Sam

Reputation: 313

def weather(*args: str):
years = []
cities = []
for arg in args:
    if arg.isdigit():
        years.append(arg)
    else: cities.append(arg)

I prefer just passing multiple arguments over inputting a list because if you're trying to create a user-friendly program, you'll have to make each of their inputs a list anyway. I think this is probably a fine way of doing things, but you can also just use your original code with a for loop and input lists in place of city and year.

Upvotes: 1

Related Questions