Reputation: 2014
I need to append a row to pandas dataframe inside a function and using the values that are passed as arguments.
import pandas as pd
# Declare global DataFrame
global df
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
def append_row(a,b,c):
vlist = [a,b,c]
cols = ['x','y','z']
# using zip() to convert lists to dictionary
res = dict(zip(cols, vlist))
# Create pandas DataFrame for new row addition
df = df.append(res, ignore_index=True)
print("New row added", df.tail(1))
return df
Expected Output:
New row appended to `df`
x y z
1 2 3
a b c
When I run this code, I get an:
Python 3: UnboundLocalError: local variable `df` referenced before assignment.
How would I be able to modify pandas DataFrame and add a new row by referencing a dataframe that's read outside the function?
Additional context: The function is called/invoked from a different script and the DataFrame is read in the same script as function declaration.
Upvotes: 0
Views: 2556
Reputation: 116
There are two issues:
Local function cannot modify a global variable. The "df" inside the function is trying to create a local variable but fails because this name is already taken by the global variable. In general, usage of globals in Python is discouraged. Check out this thread.
The df.append() takes type Series or dict. Both require column names, I presume that is why you decided to wrap it in a function. Ideally you'd change your input type to be a Series or dict and avoid hardcoding column names.
However, I ran into the same problem when I could not modify the input easily. This is the most explicit solution that I could think of:
def append_row(dataframe, args):
row = dict(zip(dataframe.columns.to_list(), args))
return dataframe.append(row, ignore_index=True)
#usage
global df
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
df = append_row(df, [4,5,6])
df = append_row(df, [7, '8 as text', [9, 'in a list']])
print(df)
And this solution uses list unpacking and allows multiple input variables as in your original code sample:
def append_row(dataframe, *args):
row = dict(zip(dataframe.columns.to_list(), args))
return dataframe.append(row, ignore_index=True)
#usage
global df
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
df = append_row(df, 4, 5, 6)
df = append_row(df, 7, '8 as text', [9, 'in a list'])
print(df)
Both produce the same output:
x y z
0 1 2 3
1 4 5 6
2 7 8 as text [9, in a list]
Hope this helps @kms. Happy Pythoning :)
Upvotes: 4
Reputation: 820
Put global inside, However its a bad programming practice to modify global things as it will be harder to debug in later stages.
import pandas as pd
# Declare DataFrame
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
def append_row(a,b,c):
vlist = [a,b,c]
cols = ['x','y','z']
# using zip() to convert lists to dictionary
res = dict(zip(cols, vlist))
# Create pandas DataFrame for new row addition and assign to global df
global df
df = df.append(res, ignore_index=True)
print("New row added", df.tail(1))
return df
append_row('a','b','c')
df
Upvotes: 1
Reputation: 4322
If you want to insert row-by-row you can simply add new values as a tuple:
def append_row(a, b, c):
global df
df.loc[df.shape[0], :] = a, b, c
return df
On the other hand, since you are returning df anyway I see no reason it should be a global. You can pass dataframe as an argument to your function and a tuple of new values:
def append_row(df: pd.DataFrame, new_data: tuple) -> pd.DataFrame:
df.loc[df.shape[0], :] = new_data
return df
Upvotes: 1
Reputation: 369
global df should be inside function
df = pd.DataFrame([['1','2','3']], columns=['x','y','z'])
def append_row(a,b,c):
global df
vlist = [a,b,c]
cols = ['x','y','z']
# using zip() to convert lists to dictionary
res = dict(zip(cols, vlist))
# Create pandas DataFrame for new row addition
df = df.append(res, ignore_index=True)
print("New row added", df.tail(1))
return df
append_row(1,2,3)
Upvotes: 1