Reputation: 4546
I have a function which plots the log of two columns from a Pandas
DataFrame
. As such zeros cause an error and need to be removed. At the moment the input to the function is two columns from a DataFrame
. Is there a way to remove any rows containing zeros? For example an equivalent version of df = df[df.ColA != 0]
def logscatfit(x,y,title):
xvals2 = np.arange(-2,6,1)
a = np.log(x) #These are what I want to remove the zeros from
b = np.log(y)
plt.scatter(a, b, c='g', marker='x', s=35)
slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
plt.plot(xvals2, (xvals2*slope + intercept), color='red')
plt.title(title)
plt.show()
print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err
At can't think of a way of removing the zeros in both a
and b
but keeping them the same length so that I can plot a scatter graph. Is my only option to rewrite the function to take a DataFrame
and then remove the zeros with something like df1 = df[df.ColA != 0]
then df2 = df1[df1.ColB != 0]
?
Upvotes: 0
Views: 94
Reputation: 12039
I like FooBar's answer for simplicity. A more general approach is to pass the dataframe to your function and use the .any()
method.
def logscatfit(df,x_col_name,y_col_name,title):
two_cols = df[[x_col_name,y_col_name]]
mask = two_cols.apply(lambda x: ( x==0 ).any(), axis = 1)
df_to_use = df[mask]
x = df_to_use[x_col_name]
y = df_to_use[y_col_name]
#your code
a = n.log(x)
etc
Upvotes: 1
Reputation: 16518
As I understand your question, you need to remove rows where either (and/or) x
or y
are zero.
A simple approach is
keepThese = (x > 0) & (y > 0)
a = x[keepThese]
b = y[keepThese]
and then proceed with your code.
Upvotes: 2
Reputation: 11717
Inserting FooBar
's answer into your function gives:
def logscatfit(x,y,title):
xvals2 = np.arange(-2,6,1)
keepThese = (x > 0) & (y > 0)
a = x[keepThese]
b = y[keepTheese]
a = np.log(a)
b = np.log(b)
plt.scatter(a, b, c='g', marker='x', s=35)
slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
plt.plot(xvals2, (xvals2*slope + intercept), color='red')
plt.title(title)
plt.show()
print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err
Upvotes: 0