kms
kms

Reputation: 2014

Modify a Pandas dataframe inside a python function

I need to append a row to pandas dataframe inside a function and using the values that are passed as arguments.

import pandas as pd

# Declare global DataFrame 
global df 
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])

def append_row(a,b,c):

    vlist = [a,b,c]
    cols = ['x','y','z']

    # using zip() to convert lists to dictionary
    res = dict(zip(cols, vlist))

    # Create pandas DataFrame for new row addition
    df = df.append(res, ignore_index=True)

    print("New row added", df.tail(1))

    return df

Expected Output:

New row appended to `df`

x y z
1 2 3
a b c

When I run this code, I get an:

Python 3: UnboundLocalError: local variable `df` referenced before assignment. 
How would I be able to modify pandas DataFrame and add a new row by referencing a dataframe that's read outside the function? 

Additional context: The function is called/invoked from a different script and the DataFrame is read in the same script as function declaration.

Upvotes: 0

Views: 2556

Answers (4)

positive.definite
positive.definite

Reputation: 116

There are two issues:

  1. Usage of global keyword

Local function cannot modify a global variable. The "df" inside the function is trying to create a local variable but fails because this name is already taken by the global variable. In general, usage of globals in Python is discouraged. Check out this thread.

  1. Appending a DataFrame

The df.append() takes type Series or dict. Both require column names, I presume that is why you decided to wrap it in a function. Ideally you'd change your input type to be a Series or dict and avoid hardcoding column names.

However, I ran into the same problem when I could not modify the input easily. This is the most explicit solution that I could think of:

def append_row(dataframe, args):
    row = dict(zip(dataframe.columns.to_list(), args))
    return dataframe.append(row, ignore_index=True)

#usage
global df 
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
df = append_row(df, [4,5,6])
df = append_row(df, [7, '8 as text', [9, 'in a list']])
print(df)

And this solution uses list unpacking and allows multiple input variables as in your original code sample:

def append_row(dataframe, *args):
row = dict(zip(dataframe.columns.to_list(), args))
return dataframe.append(row, ignore_index=True)

#usage
global df 
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
df = append_row(df, 4, 5, 6)
df = append_row(df, 7, '8 as text', [9, 'in a list'])
print(df)

Both produce the same output:

   x          y               z
0  1          2               3
1  4          5               6
2  7  8 as text  [9, in a list]

Hope this helps @kms. Happy Pythoning :)

Upvotes: 4

k33da_the_bug
k33da_the_bug

Reputation: 820

Put global inside, However its a bad programming practice to modify global things as it will be harder to debug in later stages.

import pandas as pd

# Declare DataFrame  
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])

def append_row(a,b,c):

    vlist = [a,b,c]
    cols = ['x','y','z']

    # using zip() to convert lists to dictionary
    res = dict(zip(cols, vlist))

    # Create pandas DataFrame for new row addition and assign to global df
    global df

    df = df.append(res, ignore_index=True)

    print("New row added", df.tail(1))

    return df

append_row('a','b','c')
df

Upvotes: 1

NotAName
NotAName

Reputation: 4322

If you want to insert row-by-row you can simply add new values as a tuple:

def append_row(a, b, c):
    global df
    df.loc[df.shape[0], :] = a, b, c
    return df

On the other hand, since you are returning df anyway I see no reason it should be a global. You can pass dataframe as an argument to your function and a tuple of new values:

def append_row(df: pd.DataFrame, new_data: tuple) -> pd.DataFrame:
    df.loc[df.shape[0], :] = new_data
    return df

Upvotes: 1

Gopal Gautam
Gopal Gautam

Reputation: 369

global df should be inside function

df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])

def append_row(a,b,c):
    global df
    vlist = [a,b,c]
    cols = ['x','y','z']

    # using zip() to convert lists to dictionary
    res = dict(zip(cols, vlist))

    # Create pandas DataFrame for new row addition
    df = df.append(res, ignore_index=True)

    print("New row added", df.tail(1))

    return df

append_row(1,2,3)

Upvotes: 1

Related Questions