Reputation: 77
I have a Dataframe like this:
import numpy as np
import pandas as pd
df=pd.DataFrame([['travail','hk','hj'],['test',6,6],[5,5,8],[4,3,1],['moyenne',5,6],[5,6,7],
[1,2,3],['travail','test','kkj'],[5,'hjjd',8],['moyenne',6,7],[5,5,8],[4,3,1],['hkk',5,6],[5,5,8],
[7,8,5]],columns=['A','B','C'])
I want to select all line between travail
and moyennee
in column A and obtain :
A B C
0 travail hk hj
1 test 6 6
2 5 5 8
3 4 3 1
4 moyenne 5 6
7 travail test kkj
8 5 hjjd 8
9 moyenne 6 7
How can I do that?
Upvotes: 3
Views: 446
Reputation: 24613
One can use for
loop with iloc
to check each row and append rows within desired blocks to a new empty dataframe:
newdf = pd.DataFrame(columns=df.columns)
flag = False
for i in range(len(df)):
firstval = df.iloc[i,0]
if firstval == 'travail':
newdf = newdf.append(df.iloc[i,:])
flag = True
elif firstval == 'moyenne':
newdf = newdf.append(df.iloc[i,:])
flag = False
elif flag==True:
newdf = newdf.append(df.iloc[i,:])
print(newdf)
Output:
A B C
0 travail hk hj
1 test 6 6
2 5 5 8
3 4 3 1
4 moyenne 5 6
7 travail test kkj
8 5 hjjd 8
9 moyenne 6 7
Upvotes: 1
Reputation: 863431
Compare column by Series.eq
(==
), second change ordering by Series.iloc
, get Series.cumsum
and compare again by Series.gt
(>
), chain mask by &
for bitwise AND
, last filter by boolean indexing
:
m1 = df['A'].eq('travail').cumsum().gt(0)
m2 = df['A'].eq('moyenne').iloc[::-1].cumsum().gt(0)
df1 = df[m1 & m2]
print (df1)
A B C
0 travail hk hj
1 test 6 6
2 5 5 8
3 4 3 1
4 moyenne 5 6
5 5 6 7
6 1 2 3
7 travail test kkj
8 5 hjjd 8
9 moyenne 6 7
If exist always both values in column A
is possible use Series.idxmax
with DataFrame.loc
:
a = df['A'].eq('travail').idxmax()
b = df['A'].eq('moyenne').iloc[::-1].idxmax()
df1 = df.loc[a:b]
Upvotes: 1