Henri
Henri

Reputation: 1235

Loop a function and bind together Pandas dataframe

I have a function that connects to an SQL database and fetch data from a table. I aim to loop that function over an iterator to make the same query on different tables. The function works, but the for loop below do not return anything. I'm new to Python and I'm sure I miss something fundamental here.

Example Code-

def func(years):
  
    conn = pyodc.connect()
      sql_query = """ SELECT TOP 100 X
                      FROM table_""" + years
      df = pd.Dataframe()
      if len(df) == 0:
         df = pd.read_sql(sql_query, conn)
         df['year'] = years
      else:
         df_temp = df.copy()
         temp = pd.read_sql_query)sql_query, conn)
         temp['year'] = years
         df = temp.append(df_temp)
    
      return df

loop = ['2017', '2018']

for year in loop:
    func(year)

Upvotes: 1

Views: 613

Answers (2)

BernardL
BernardL

Reputation: 5434

You need to pass an argument to the function for it to accept and loop over. Also you might want to consider pd.concat so you can assign a variable to your dataframe.

So something like this would work and is cleaner:

def select_top(year):
    conn = pyodc.connect()
    sql_query = "SELECT TOP 100 X FROM table_{}".format(year)

    df = pd.read_sql(sql_query, conn)
    df['year'] = years

    return df

years = ['2017', '2018']

data = pd.concat([select_top(year) for year in years])

It will loop over the years and concatenates all your dataframes into 1.

Upvotes: 1

Shradha
Shradha

Reputation: 2442

You need to pass year as an argument to the func function as follows-

def func(year):
      conn = pyodc.connect()
      sql_query = """ SELECT TOP 100 X
                      FROM table_{}""".format(years)
      df = pd.Dataframe()
      if len(df) == 0:
         df = pd.read_sql(sql_query, conn)
         df['year'] = years
      else:
         df_temp = df.copy()
         temp = pd.read_sql_query(sql_query, conn)
         temp['year'] = years
         df = temp.append(df_temp)

      return df

loop = ['2017', '2018']

for year in loop:
    func(year)

For more details on passing arguments to functions, refer here.

Your code has some other logical errors too-

  1. You're creating an empty dataframe before the if condition. It'll always set the if condition True.
  2. You're not assigning the return value of your function to any variable.
  3. You might need to use df_temp instead of df at some places.
  4. You're not passing df as a function parameter (if you wish to modify it in the function). For doing so, you'll need to create an empty dataframe before the for loop as well.

Upvotes: 1

Related Questions