Reputation: 9
I have the following dataframe:
import pandas as pd
data = {'Name': ['Ankit', 'Amit', 'Aishwarya',
'Priyanka','Kovacs','Tompos'],
'Age': [21, 19, 20, 18, 20, 19],
'Stream': ['Math', 'Commerce', 'Arts',
'Biology', 'Math', 'Arts'],
'Percentage': [88, 92, 95, 70, 80, 83]}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Stream',
'Percentage'])
Name Age Stream Percentage
0 Ankit 21 Math 88
1 Amit 19 Commerce 92
2 Aishwarya 20 Arts 95
3 Priyanka 18 Biology 70
4 Kovacs 20 Math 80
5 Tompos 19 Arts 83
I would like to split the dataframe based on rows that contain Arts ins Stream column in the following way: the rows between the two rows that containing the Arts form a separate dataframe.
My code is the following:
first=df.iloc[0:2]
second=df.iloc[3:5]
But I would like to replace this with a code that takes into account if the index changes, applicable to larger dataframes and. Can someone help me with this?
Thanks!
Upvotes: 1
Views: 49
Reputation: 262204
Set up a grouper with eq
and cumsum
, then groupby
to split. Reverse the Series with [:::-1]
to group the previous rows with each "Arts":
dfs = [g for _, g in df.groupby(df.loc[::-1, 'Stream'].eq('Arts').cumsum(),
sort=False)]
Output:
[ Name Age Stream Percentage
0 Ankit 21 Math 88
1 Amit 19 Commerce 92
2 Aishwarya 20 Arts 95,
Name Age Stream Percentage
3 Priyanka 18 Biology 70
4 Kovacs 20 Math 80
5 Tompos 19 Arts 83]
Intermediates:
Name Age Stream Percentage eq('Arts') cumsum
0 Ankit 21 Math 88 False 2
1 Amit 19 Commerce 92 False 2
2 Aishwarya 20 Arts 95 True 2
3 Priyanka 18 Biology 70 False 1
4 Kovacs 20 Math 80 False 1
5 Tompos 19 Arts 83 True 1
And if you want to exclude the "Arts":
m = df.loc[::-1, 'Stream'].eq('Arts')
dfs = [g for _, g in df[~m].groupby(m.cumsum(), sort=False)]
Output:
[ Name Age Stream Percentage
0 Ankit 21 Math 88
1 Amit 19 Commerce 92,
Name Age Stream Percentage
3 Priyanka 18 Biology 70
4 Kovacs 20 Math 80]
Upvotes: 1