Alissa L.
Alissa L.

Reputation: 103

Group by ID and check if values are the same within a margin of error?

I have a data frame like this:

ID    Value
111   10
111   5
112   11
112   11

And I want to create a third column "Check" that will be a binary 1 or 0 (True or False) to the following condition: Are all the values for the same ID number the same within a 5% margin of error?

For example, for ID number 111, the column Check would be 1 if the values 10 and 5 are the same or if the percentage difference between them is 5%. Pct. difference being calculated as abs(5 - 10) / (5 + 10)/2

The output would then be:

ID    Value. Check
111   10    0
111   5     0
112   11    1
112   11    1

I am using the following code:

a = df.groupby([df['ID']])['Value'].nunique().eq(1)
index_list = a[a].index.tolist()

df['Check'] = 0
df.loc[df['ID'].isin(index_list), 'Check'] = 1

but it only checks if the values are the same and I am not sure to incorporate the 5 pct. difference in the check.

I would also only like to do this for when we have more than one observation per ID number, and return an NaN to the column Check when it's only one observation?

Thank you!!

Upvotes: 0

Views: 404

Answers (2)

BENY
BENY

Reputation: 323346

Try transform

g = df.groupby('ID')['Value']

df['new'] = ((g.transform(np.ptp)/g.transform('mean'))<0.05).astype(int)
df
Out[40]: 
    ID  Value  new
0  111     10    0
1  111      5    0
2  112     11    1
3  112     11    1

Upvotes: 1

Valdi_Bo
Valdi_Bo

Reputation: 31011

Run:

df['Check'] = df.groupby([df['ID']]).Value.transform(
    lambda grp: (grp.max() - grp.min()) / grp.mean() < 0.05)

Upvotes: 0

Related Questions