JSells
JSells

Reputation: 95

Is it possible to use a custom filter function in pandas?

Can I use my helper function to determine if a shot was a three pointer as a filter function in Pandas? My actual function is much more complex, but i simplified it for this question.

def isThree(x, y):
    return (x + y == 3)

print data[isThree(data['x'], data['y'])].head()

Upvotes: 9

Views: 11530

Answers (5)

John Langford
John Langford

Reputation: 1445

I'm fairly new to python which may be why I was having trouble getting the other proposed solutions to work, but what worked for me was:

for index in range(len(data.index)):
    x = data.loc[index, 'x']
    y = data.loc[index, 'y']

    if not isThree(x, y):
        data = data.drop(index)

Upvotes: 0

Nathaniel
Nathaniel

Reputation: 3290

Yes:

import numpy as np
import pandas as pd

data = pd.DataFrame({'x': np.random.randint(1,3,10),
                     'y': np.random.randint(1,3,10)})
print(data)

Output:

   x  y
0  1  2
1  2  1
2  2  1
3  1  2
4  2  1
5  2  1
6  2  1
7  2  1
8  2  1
9  2  2
def isThree(x, y):
    return (x + y == 3)

print(data[isThree(data['x'], data['y'])].head())

Output:

   x  y
0  1  2
1  2  1
2  2  1
3  1  2
4  2  1

Upvotes: 10

ALollz
ALollz

Reputation: 59579

Yes, so long as your function returns a Boolean Series with the same index you can slice your original DataFrame with the output. In this simple example, we can pass Series to your function:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 4, (30, 2)))
def isThree(x, y):
    return x + y == 3

df[isThree(df[0], df[1])]
#    0  1
#2   2  1
#5   2  1
#9   0  3
#11  2  1
#12  0  3
#13  2  1
#27  3  0

Upvotes: 3

Kartikeya Sharma
Kartikeya Sharma

Reputation: 1383

You can use np.vectorize. Documentation is here https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

def isThree(x, y):
    return (x + y == 3)

df=pd.DataFrame({'A':[1,2],'B':[2,0]})
df['new_column'] = np.vectorize(isThree)(df['A'], df['B'])

Upvotes: -2

rahlf23
rahlf23

Reputation: 9019

In this case, I would recommend using np.where(). See the following example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'x': [1,2,4,2,3,1,2,3,4,0], 'y': [0,1,2,0,0,2,4,0,1,2]})

df['3 Pointer'] = np.where(df['x']+df['y']==3, 1, 0)

Yields:

   x  y  3 Pointer
0  1  0          0
1  2  1          1
2  4  2          0
3  2  0          0
4  3  0          1
5  1  2          1
6  2  4          0
7  3  0          1
8  4  1          0
9  0  2          0

Upvotes: 0

Related Questions