John
John

Reputation: 515

Create column by comparing two data frames with groupby

I've got one large data frame with columns like so:

TimeHrs     A    B    SeqNum

I want to figure out if the max of A for each group grouped by the SeqNum happens within +/-2 seconds of the max of B. For now, I've got a groupby going for each to obtain the rows of the max values of A and B like so:

idxmaxA = df.loc[df.groupby(['SeqNum'])['A'].idxmax()]
idxmaxB = df.loc[df.groupby(['SeqNum'])['B'].idxmax()]

This gives me two data frames with the indexes/rows of the max of A and B for each group. How would I go about comparing the results to create a column of boolean values testing if the max of A is within 2 seconds of the max of B that I can add to my original dataframe? In my case TimeHrs is 1 second data, so each row is 1 second from the last. Because of this, I thought maybe comparing the index values for each group in idxmaxA falls within +/- 2 rows of the idxmaxB, but I'm not sure how to easily do that.

Upvotes: 0

Views: 221

Answers (1)

jpp
jpp

Reputation: 164673

Is there any reason why you can't perform this in 2 steps?

In the below example, you find the max of each column groupwise, then perform your comparison using pd.Series.between.

import pandas as pd

df = pd.DataFrame([[8, 10, 1], [1, 4, 1], [5, 8, 1],
                   [5, 15, 2], [16, 10, 2], [-5, 3, 2],
                   [5, 10, 3], [15, 4, 3], [-5, 12, 3]],
                  columns=['A', 'B', 'SeqNum'])

res = df.groupby('SeqNum', as_index=False).max()
res['test'] = res['B'].between(res['A']-2, res['A']+2)

#    SeqNum   A   B   test
# 0       1   8  10   True
# 1       2  16  15   True
# 2       3  15  12  False

Upvotes: 1

Related Questions