Sascha
Sascha

Reputation: 687

Check each value in one column with each value of other column in one dataframe

I have following dataframe:

import pandas as pd
dict = {'val1':["3.2", "2.4", "-2.3", "-4.9","0"], 
        'class': ["1", "0", "0", "0", "1"],
       'val2':["3.2", "2.7", "1.7", "-7.1", "0"]} 
df = pd.DataFrame(dict) 
df
    val1    class   val2
0   3.2     1       3.2
1   2.4     0       2.7
2  -2.3     0       1.7
3  -4.9     0      -7.1
4   0.0     1       0.0

I want to check two things: 1) for the sign: if the sign of record in column val1 is not same with the sign of column val2 (for example: sign of the values at index 2 is not same), in this case change the sign of value 2 to the sign of value 1. Desired output is like this:

    val1    class   val2
0   3.2     1       3.2
1   2.4     0       2.7
2  -2.3     0      -1.7
3  -4.9     0      -7.1
4   0.0     1       0.0

2) Second check: if the value in val2 column is within the interval between value in val1 column +2 and -2. For example: record at index 2: 2.4 is in the range [2.7+2: 2.7-2]. If condition is true then i want to change class from 0 to 1. Desired output is :

    val1    class   val2
0   3.2     1       3.2
1   2.4     1       2.7
2  -2.3     1      -1.7
3  -4.9     0      -7.1
4   0.0     1       0.0

Upvotes: 2

Views: 93

Answers (3)

Gaurav Agarwal
Gaurav Agarwal

Reputation: 611

I think this will solve your query without using any other library:

def signfunc(x,y):
    if x*y >= 0:
        return y
    else:
        return -1*y

df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)
df['val2'] = df.apply(lambda x: signfunc(x.val1, x.val2), axis=1)
print(df)

df.loc[abs(df["val1"]-df["val2"])<=2, 'class'] = 1

print(df)

Upvotes: 0

jezrael
jezrael

Reputation: 862661

First convert values to floats if necessary and then set sign with numpy.sign and then for second use Series.between:

df['val1'] = df['val1'].astype(float)
df['val2'] = df['val2'].astype(float)

df['val2'] *= np.sign(df['val1']) * np.sign(df['val2'])
df['class'] = df['val2'].between(df['val1'] - 2, df['val1'] + 2).astype(int)
#alternative
#df['class'] =  np.where(df['val2'].between(df['val1'] - 2, df['val1'] + 2), 1, 0)
print (df)
   val1  class  val2
0   3.2      1   3.2
1   2.4      1   2.7
2  -2.3      1  -1.7
3  -4.9      0  -7.1
4   0.0      1   0.0

Upvotes: 3

Phung Duy Phong
Phung Duy Phong

Reputation: 896

Try this:

import numpy as np
# Check 1
df['val2'] = df.apply(lambda x: np.sign(x['val1']) * np.sign(x['val2']) * x['val2'], axis=1)

# Check 2
df['class'] = df.apply(lambda x: int(abs(x['val1'] - x['val2']) < 2) , axis=1)

Upvotes: 2

Related Questions