Ikaro
Ikaro

Reputation: 35

How can I find the highest value between rows every time that they met a certain condition?

I have been struggling with a problem with my data frame build in pandas that is current like this

MyDataFrame:

Index   Status    Value
0              A            10
1              A             8
2              A             5
3              B             9
4              B             5
5              A             1
6              B             2
7              A             3
8              A             5
9              A             1

The desired output would be:
Index   Status    Value
0              A            10
1              B             9
2              A             1
3              B             2
4              A             5

So far I tried to use range and while conditions to filter, however, if I put a conditional like :

for i in range:
        if Status[i] == "A":
          print(Value[i])
        if  Status == "B":
           break

** The code above is more an example of what I have been trying to reach my goal, I tried to use .iloc and range with while, but maybe in the wrong way idk.*

The desired output isn't printed.

One thing that complicates this filtering process is that MyDataFrame changes every time that I run the script since it uses another base of data to create this DataFrame.

I believe that I'm missing something simple, but it has been almost a week and I can't figure out.

Thanks in advance for all your answers and support.

Upvotes: 2

Views: 42

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Very close to @BEN_YO:

grp = (df['Status'] != df['Status'].shift()).cumsum()
df.loc[df.groupby(grp)['Value'].idxmax()]

Output:

      Status  Value
Index              
0          A     10
3          B      9
5          A      1
6          B      2
8          A      5

Create groups using shift and inequality with cumsum, then groupby and find the index of the max value of 'Value', idxmax, and filter the dataframe using loc

Upvotes: 2

BENY
BENY

Reputation: 323236

Let us try using shift with cumsum create the groupby key , then it is groupby + agg

out = df.groupby(df.Status.ne(df.Status.shift()).cumsum()).agg({'Status':'first','Value':'max'})
Out[14]: 
       Status  Value
Status              
1           A     10
2           B      9
3           A      1
4           B      2
5           A      5

Upvotes: 2

Related Questions