Reputation: 4546
Could somebody explain why the following code produces a NameError
?
def nonull(df, col, name):
name = df[pd.notnull(df[col])]
print name[col].count(), df[col].count()
return name
nonull(sve, 'DOC_mg/L', 'sveDOC')
sveDOC.count()
NameError: name 'sveDOC' is not defined
711 711
The dataframe
seems to be created as the print
statement works, so I don't understand why when I try to use sveDOC
(which was name
inside the function) it produces an error.
Here's an example of what I'd like to do within the function:
import pandas as pd
d = {'one' : pd.Series([1., 1., 1., 1.], index=['a', 'b', 'c', 'd']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
pd.DataFrame(d)
df = pd.DataFrame(d)
df1 = df
df = df * 2
print df.head(), df1.head()
one two
a 2 2
b 2 4
c 2 6
d 2 8
one two
a 1 1
b 1 2
c 1 3
d 1 4
Upvotes: 1
Views: 7193
Reputation: 122107
Python names do not work the way you seem to think. Here's what your code actually does:
def nonull(df, col, name):
name = df # rebind the name 'name' to the object referenced by 'df'
name = df[pd.notnull(name[col])] # rebind the name 'name' again
print name[col].count(), df[col].count()
return name # return the instance
nonull(sve, 'DOC_mg/L', 'sveDOC') # call the function and ignore the return value
The function never actually uses the 'sveDOC'
argument. Here's what you should actually do:
def nonull(df, col):
name = df[pd.notnull(df[col])]
print name[col].count(), df[col].count()
return name
sveDOC = nonull(sve, 'DOC_mg/L')
sveDOC.count()
Your conception of Python's use of names and references is completely wrong.
pd.DataFrame(d) # creates a new DataFrame but doesn't do anything with it
# (what was the point of this line?)
df = pd.DataFrame(d) # assigns a second new DataFrame to the name 'df'
df1 = df # assigns the name `df1` to the same object that 'df' refers to
# - note that this does *not* create a copy
df = df * 2 # create a new DataFrame based on the one referenced by 'df'
# (and 'df1'!)and assign to the name 'df'
To demonstrate this:
df1 = pd.DataFrame(d)
df2 = df1
df1 is df2
Out[5]: True # still the same object
df2 = df2 * 2
df1 is df2
Out[7]: False # now different
If you want to create a copy of a DataFrame
, do so explicitly:
df2 = copy(df1)
You can either do this outside nonull
and pass the copy, or do it inside nonull
and return
the modified copy.
Upvotes: 2