Recessive
Recessive

Reputation: 1949

Filter pandas index by function

I want to filter a pandas dataframe by a function along the index. I can't seem to find a built-in way of performing this action.

So essentially, I have a function that through some arbitrarily complicated means determines whether a particular index should be included, I'll call it filter_func for this example. I wish to apply exactly what the below code does, but to the index:

new_index = filter(filter_func, df.index)

And only include the values that the filter_func allows. The index could also be any type.

This is a pretty important factor of data manipulation, so I imagine there's a built-in way of doing this action.

ETA:

I found that indexing the dataframe by a list of booleans will do what I want, but still requires double the space of the index in order to apply the filter. So my question still remains if there's a built-in way of doing this that does not require twice the space.

Here's an example:

import pandas as pd
df = pd.DataFrame({"value":[12,34,2,23,6,23,7,2,35,657,1,324]})

def filter_func(ind, n=0):
    if n > 200: return False
    if ind % 79 == 0: return True
    return filter_func(ind+ind-1, n+1)

new_index = filter(filter_func, df)

And I want to do this:

mask = []
for i in df.index:
    mask.append(filter_func(i))
df = df[mask]

But in a way that doesn't take twice the space of the index to do so

Upvotes: 5

Views: 3138

Answers (3)

Appliqué
Appliqué

Reputation: 123

If you want to avoid referencing df explicitly inside the filtering condition, you can use the following:

import pandas as pd
df = pd.DataFrame({"value":[12,34,2,23,6,23,7,2,35,657,1,324]}, dtype=object)

df.apply(lambda x: x if filter_func(x.name) else None, axis=1, result_type='broadcast').dropna()

Upvotes: 0

anky
anky

Reputation: 75120

You can use map instead of filter and then do a boolean indexing:

df.loc[map(filter_func,df.index)]

   value
0     12
4      6
7      2
8     35

Upvotes: 4

Praneeth Katuri
Praneeth Katuri

Reputation: 3

Have you tried using df.apply?

>>> df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=['a', 'b', 'c'])
   a  b  c
0  0  1  2
1  3  4  5
2  6  7  8

df[df.apply(lambda x: x['c']%2 == 0, axis = 1)]
   a  b  c
0  0  1  2
2  6  7  8

You can customize the lambda function in any way you want, let me know if this isn't what you're looking for.

Upvotes: 0

Related Questions