Reputation: 3086
I need to split the entire data frame (Pandas) based on some criteria. For example:
import pandas as pd
import numpy as np
a = np.random.random(size=(1,10))
b = np.random.randint(100,size=a.shape)
df = pd.DataFrame(array((a,b)).T,columns=['a','b'])
now, if I want to split the data frame into two pieces, where df['a'] >= 30 and df['a'] < 30
df_two = [df[df['a'] < 30], df[df['a']>=30]]
Is there more elegant way of splitting a data frame based on some condition? With list comprehension or loops? For example, if I have more that just one condition and I want to iterate over a list of conditions and do not do it manually.
Upvotes: 2
Views: 4844
Reputation: 6663
Let's say you have multiple conditions expressed via strings in an iterable like:
conditions = ("a < 30", "a > 50", "b < 10 and a > 10")
You can iterate over them and create the split (including and excluding condition) using a dictionary comprehension and a little helper function splitter
:
def splitter(sub_df, condition):
bool_split = sub_df.eval(condition)
return (sub_df[bool_split], sub_df[~bool_split])
splits = {cond: splitter(df, cond) for cond in conditions}
The result is a dictionary with conditions as keys and splitted data frames as values.
Upvotes: 1