Reputation: 4705

Create two dataframes in pandas based on values in columns

Given the following data

df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7], "b": [4, 5, 9, 5, 6, 4, 0]})
df["split_by"] = df["b"].eq(9)

which looks as

a  b  split_by
0  1  4     False
1  2  5     False
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

I would like to create two dataframes as follows:

   a  b  split_by
0  1  4     False
1  2  5     False

and

   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Clearly this is based on the value in column split_by, but I'm not sure how to subset using this.

My approach is:

split_1 = df.index < df[df["split_by"].eq(True)].index.to_list()[0]
split_2 = ~df.index.isin(split_1)

df1 = df[split_1]
df2 = df[split_2]

Upvotes: 1

Answers (3)

Fab

Reputation: 161

Another approach:

i = df[df['split_by']==True].index.values[0]
df1 = df.iloc[:i]
df2 = df.iloc[i:]

This is assuming you have only one "True". If you have more than one "True", this code will split df into only two dataframes regardless, considering only the first "True".

Upvotes: 1

BENY

Reputation: 323236

Use groupby with cumsum , notice if you have more then one True , this will split the dataframe to n+1 dfs (n True)

d={x : y for x , y in df.groupby(df.split_by.cumsum())}
d[0]
   a  b  split_by
0  1  4     False
1  2  5     False
d[1]
   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Upvotes: 0

Space Impact

Reputation: 13255

Use argmax as:

true_index = df['split_by'].argmax()
df1 = df.loc[:true_index-1, :]
df2 = df.loc[true_index:, :]

print(df1)
   a  b  split_by
0  1  4     False
1  2  5     False

print(df2)
   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

Upvotes: 2

Create two dataframes in pandas based on values in columns

Answers (3)

Related Questions