Reputation: 81
Input:
Part no | A | B | C | D |
---|---|---|---|---|
A1 | 0.25 | 0.2 | 0.3 | 0.4 |
A2 | 0.26 | 0.3 | 0.3 | 0.4 |
A3 | 0.3 | 0.3 | 0.3 | 0.3 |
A4 | 0.7 | 0.3 | 0.3 | 0.3 |
A5 | 0.8 | 0.4 | 0.45 | 0.46 |
I have to create set for duplicates on the column A with the tolerance of +/-0.1
Expected output
Part no | A | B | C | D | Set |
---|---|---|---|---|---|
A1 | 0.25 | 0.2 | 0.3 | 0.4 | 1 |
A2 | 0.26 | 0.3 | 0.3 | 0.4 | 1 |
A3 | 0.3 | 0.3 | 0.3 | 0.3 | 1 |
A4 | 0.7 | 0.3 | 0.3 | 0.3 | 2 |
A5 | 0.8 | 0.4 | 0.45 | 0.46 | 2 |
Upvotes: 1
Views: 71
Reputation: 786
Would something like this work for you? The last example of 0.7 and 0.8 is a bit strange anyways. Why would you like them in the same group?
Reference to my_ciel
answer since nor python nor numpy has the builtin functionality to do a cieling round on arbitrary float positions: my_ciel answer
import numpy as np
import pandas as pd
# Some boilerplate to make your sample data reusable (I suggest u post your questions in such way
# that data can be easily coppied)
a = np.array([0.25,0.2,0.3,0.4,0.26,0.3,0.3,0.4,0.3,0.3,0.3,0.3,0.7,0.3,0.3,0.3,0.8,0.4,0.45,0.46,])
a = a.reshape(5,4)
df = pd.DataFrame(a)
df.columns = ["A","B","C","D"]
df["PartNo"] = ["A1","A2","A3","A4","A5"]
df = df[["PartNo","A","B","C","D"]]
# This does a cieling round at the desired float position
def my_ceil(a, precision=0):
return np.true_divide(np.ceil(a * 10**precision), 10**precision)
# Tolerance can be set here:
# 1 = 0.1 , 2 = 0.01 , 3 = 0.001 etc.
df["set"] = my_ceil(df["A"],1)
df["set"] = df["set"].astype('category').cat.codes
df
Upvotes: 1
Reputation: 120401
Use cumsum
to create groups:
# If your dataframe is not sorted by 'A' columns
df = df.sort_values('A')
df['Set'] = df['A'].sub(df['A'].shift()).abs().ge(0.1000000001).cumsum().add(1)
>>> df
Part no A B C D Set
0 A1 0.25 0.2 0.30 0.40 1
1 A2 0.26 0.3 0.30 0.40 1
2 A3 0.30 0.3 0.30 0.30 1
3 A4 0.70 0.3 0.30 0.30 2
4 A5 0.80 0.4 0.45 0.46 2
0.1000000001
is due to float precision. You can also use np.isclose
.
With np.close
:
>>> df['Set'] = np.cumsum(~np.isclose(df['A'], df['A'].shift(), atol=0.1))
Part no A B C D Set
0 A1 0.25 0.2 0.30 0.40 1
1 A2 0.26 0.3 0.30 0.40 1
2 A3 0.30 0.3 0.30 0.30 1
3 A4 0.70 0.3 0.30 0.30 2
4 A5 0.80 0.4 0.45 0.46 2
Upvotes: 2