Reputation: 4546

Removing entire rows that contain a zero in two Pandas series

I have a function which plots the log of two columns from a Pandas DataFrame. As such zeros cause an error and need to be removed. At the moment the input to the function is two columns from a DataFrame. Is there a way to remove any rows containing zeros? For example an equivalent version of df = df[df.ColA != 0]

def logscatfit(x,y,title):
    xvals2 = np.arange(-2,6,1)
    a = np.log(x) #These are what I want to remove the zeros from
    b = np.log(y)
    plt.scatter(a, b, c='g', marker='x', s=35)
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
    plt.plot(xvals2, (xvals2*slope + intercept), color='red')
    plt.title(title)
    plt.show()
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err

At can't think of a way of removing the zeros in both a and b but keeping them the same length so that I can plot a scatter graph. Is my only option to rewrite the function to take a DataFrame and then remove the zeros with something like df1 = df[df.ColA != 0] then df2 = df1[df1.ColB != 0]?

Upvotes: 0

Answers (3)

exp1orer

Reputation: 12039

I like FooBar's answer for simplicity. A more general approach is to pass the dataframe to your function and use the .any() method.

def logscatfit(df,x_col_name,y_col_name,title):
    two_cols = df[[x_col_name,y_col_name]]
    mask = two_cols.apply(lambda x: ( x==0 ).any(), axis = 1)
    df_to_use = df[mask]
    x = df_to_use[x_col_name]
    y = df_to_use[y_col_name]

    #your code
    a = n.log(x)
    etc

Upvotes: 1

FooBar

Reputation: 16518

As I understand your question, you need to remove rows where either (and/or) x or y are zero.

A simple approach is

keepThese = (x > 0) & (y > 0)
a = x[keepThese]
b = y[keepThese]

and then proceed with your code.

Upvotes: 2

Guillaume Jacquenot

Reputation: 11717

Inserting FooBar's answer into your function gives:

def logscatfit(x,y,title):
    xvals2 = np.arange(-2,6,1)
    keepThese = (x > 0) & (y > 0)
    a = x[keepThese]
    b = y[keepTheese]        
    a = np.log(a)
    b = np.log(b)
    plt.scatter(a, b, c='g', marker='x', s=35)
    slope, intercept, r_value, p_value, std_err = stats.linregress(a,b)
    plt.plot(xvals2, (xvals2*slope + intercept), color='red')
    plt.title(title)
    plt.show()
    print "Slope is:",slope, ". Intercept is:",intercept,". R-value is:",r_value,". P-value is:",p_value,". Std_err is:",std_err

Upvotes: 0

Removing entire rows that contain a zero in two Pandas series

Answers (3)

Related Questions