Reputation: 117
To prep my data correctly for a ML task, I need to be able to split my original dataframe into multiple smaller dataframes. I want to get all the rows above and including the row where the value for column 'BOOL' is 1 - for every occurrence of 1. i.e. n dataframes where n is the number of occurences of 1.
A sample of the data:
df = pd.DataFrame({"USER_ID": ['001', '001', '001', '001', '001'],
'VALUE' : [1, 2, 3, 4, 5], "BOOL": [0, 1, 0, 1, 0]})
Expected Output is 2 dataframes as shown:
And:
I have considered a for loop using if-else statements to append rows - but it is highly inefficient for the data-set I am using. Looking for a more pythonic way of doing this.
Upvotes: 3
Views: 5362
Reputation: 323266
I think using for loop is better here
idx=df.BOOL.nonzero()[0]
d={x : df.iloc[:y+1,:] for x , y in enumerate(idx)}
d[0]
BOOL USER_ID VALUE
0 0 001 1
1 1 001 2
Upvotes: 3
Reputation: 71580
Why not list comprehension? like:
>>> l=[df.iloc[:i+1] for i in df.index[df['BOOL']==1]]
>>> l[0]
BOOL USER_ID VALUE
0 0 001 1
1 1 001 2
>>> l[1]
BOOL USER_ID VALUE
0 0 001 1
1 1 001 2
2 0 001 3
3 1 001 4
>>>
Upvotes: 2
Reputation: 36249
You can use np.split
which accepts an array of indices where to split:
np.split(df, *np.where(df.BOOL == 1))
If you want to include the rows with BOOL == 1
to the previous data frame you can just add 1 to all the indices:
np.split(df, np.where(df.BOOL == 1)[0] + 1)
Upvotes: 5