Arnold Klein
Arnold Klein

Reputation: 3086

Conditional slicing with Pandas (the elegant way)

I need to split the entire data frame (Pandas) based on some criteria. For example:

import pandas as pd
import numpy as np

a = np.random.random(size=(1,10))
b = np.random.randint(100,size=a.shape)
df = pd.DataFrame(array((a,b)).T,columns=['a','b'])

now, if I want to split the data frame into two pieces, where df['a'] >= 30 and df['a'] < 30

df_two = [df[df['a'] < 30], df[df['a']>=30]]

Is there more elegant way of splitting a data frame based on some condition? With list comprehension or loops? For example, if I have more that just one condition and I want to iterate over a list of conditions and do not do it manually.

Upvotes: 2

Views: 4844

Answers (2)

pansen
pansen

Reputation: 6663

Let's say you have multiple conditions expressed via strings in an iterable like:

conditions = ("a < 30", "a > 50", "b < 10 and a > 10")

You can iterate over them and create the split (including and excluding condition) using a dictionary comprehension and a little helper function splitter:

def splitter(sub_df, condition):
    bool_split = sub_df.eval(condition)
    return (sub_df[bool_split], sub_df[~bool_split])

splits = {cond: splitter(df, cond) for cond in conditions}

The result is a dictionary with conditions as keys and splitted data frames as values.

Upvotes: 1

jezrael
jezrael

Reputation: 862611

You can use:

mask = df['a'] < 30
df_two = [df[mask], df[~mask]]

Upvotes: 2

Related Questions