Reputation: 4705
Given the following data
df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7], "b": [4, 5, 9, 5, 6, 4, 0]})
df["split_by"] = df["b"].eq(9)
which looks as
a b split_by
0 1 4 False
1 2 5 False
2 3 9 True
3 4 5 False
4 5 6 False
5 6 4 False
6 7 0 False
I would like to create two dataframes as follows:
a b split_by
0 1 4 False
1 2 5 False
and
a b split_by
2 3 9 True
3 4 5 False
4 5 6 False
5 6 4 False
6 7 0 False
Clearly this is based on the value in column split_by
, but I'm not sure how to subset using this.
My approach is:
split_1 = df.index < df[df["split_by"].eq(True)].index.to_list()[0]
split_2 = ~df.index.isin(split_1)
df1 = df[split_1]
df2 = df[split_2]
Upvotes: 1
Views: 44
Reputation: 161
Another approach:
i = df[df['split_by']==True].index.values[0]
df1 = df.iloc[:i]
df2 = df.iloc[i:]
This is assuming you have only one "True". If you have more than one "True", this code will split df into only two dataframes regardless, considering only the first "True".
Upvotes: 1
Reputation: 323236
Use groupby
with cumsum
, notice if you have more then one True , this will split the dataframe to n+1 dfs (n True)
d={x : y for x , y in df.groupby(df.split_by.cumsum())}
d[0]
a b split_by
0 1 4 False
1 2 5 False
d[1]
a b split_by
2 3 9 True
3 4 5 False
4 5 6 False
5 6 4 False
6 7 0 False
Upvotes: 0
Reputation: 13255
Use argmax
as:
true_index = df['split_by'].argmax()
df1 = df.loc[:true_index-1, :]
df2 = df.loc[true_index:, :]
print(df1)
a b split_by
0 1 4 False
1 2 5 False
print(df2)
a b split_by
2 3 9 True
3 4 5 False
4 5 6 False
5 6 4 False
6 7 0 False
Upvotes: 2