Reputation: 103
I have a data frame like this:
ID Value
111 10
111 5
112 11
112 11
And I want to create a third column "Check" that will be a binary 1 or 0 (True or False) to the following condition: Are all the values for the same ID number the same within a 5% margin of error?
For example, for ID number 111, the column Check would be 1 if the values 10 and 5 are the same or if the percentage difference between them is 5%. Pct. difference being calculated as abs(5 - 10) / (5 + 10)/2
The output would then be:
ID Value. Check
111 10 0
111 5 0
112 11 1
112 11 1
I am using the following code:
a = df.groupby([df['ID']])['Value'].nunique().eq(1)
index_list = a[a].index.tolist()
df['Check'] = 0
df.loc[df['ID'].isin(index_list), 'Check'] = 1
but it only checks if the values are the same and I am not sure to incorporate the 5 pct. difference in the check.
I would also only like to do this for when we have more than one observation per ID number, and return an NaN to the column Check when it's only one observation?
Thank you!!
Upvotes: 0
Views: 404
Reputation: 323346
Try transform
g = df.groupby('ID')['Value']
df['new'] = ((g.transform(np.ptp)/g.transform('mean'))<0.05).astype(int)
df
Out[40]:
ID Value new
0 111 10 0
1 111 5 0
2 112 11 1
3 112 11 1
Upvotes: 1
Reputation: 31011
Run:
df['Check'] = df.groupby([df['ID']]).Value.transform(
lambda grp: (grp.max() - grp.min()) / grp.mean() < 0.05)
Upvotes: 0