baxx
baxx

Reputation: 4705

Create two dataframes in pandas based on values in columns

Given the following data

df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7], "b": [4, 5, 9, 5, 6, 4, 0]})
df["split_by"] = df["b"].eq(9)

which looks as

a  b  split_by
0  1  4     False
1  2  5     False
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

I would like to create two dataframes as follows:

   a  b  split_by
0  1  4     False
1  2  5     False

and

   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Clearly this is based on the value in column split_by, but I'm not sure how to subset using this.

My approach is:

split_1 = df.index < df[df["split_by"].eq(True)].index.to_list()[0]
split_2 = ~df.index.isin(split_1)

df1 = df[split_1]
df2 = df[split_2]

Upvotes: 1

Views: 44

Answers (3)

Fab
Fab

Reputation: 161

Another approach:

i = df[df['split_by']==True].index.values[0]
df1 = df.iloc[:i]
df2 = df.iloc[i:]

This is assuming you have only one "True". If you have more than one "True", this code will split df into only two dataframes regardless, considering only the first "True".

Upvotes: 1

BENY
BENY

Reputation: 323236

Use groupby with cumsum , notice if you have more then one True , this will split the dataframe to n+1 dfs (n True)

d={x : y for x , y in df.groupby(df.split_by.cumsum())}
d[0]
   a  b  split_by
0  1  4     False
1  2  5     False
d[1]
   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Upvotes: 0

Space Impact
Space Impact

Reputation: 13255

Use argmax as:

true_index = df['split_by'].argmax()
df1 = df.loc[:true_index-1, :]
df2 = df.loc[true_index:, :]

print(df1)
   a  b  split_by
0  1  4     False
1  2  5     False

print(df2)
   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Upvotes: 2

Related Questions