Reputation: 300
I came across a function called quantile()
in pandas. Can somebody help me explain how this function works and what it does? An example will be extremely appreciated.
I am writing a sample code to help you better understand this function
Code i have so far:
def get_quantile_based_buckets(feature_values, num_buckets):
quantiles = feature_values.quantile([(i+1.)/(num_buckets+1.) for i in list(range(num_buckets))])
print(quantiles)
return [quantiles[q] for q in quantiles.keys()]
here feature_values
is a pandas DataFrame()
.
Here is an example to explain this function:
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
columns=['a', 'b'])
>>> df.quantile(.1)
a 1.3
b 3.7
dtype: float64
>>> df.quantile([.1, .5])
a b
0.1 1.3 3.7
0.5 2.5 55.0
If someone could explain the above example, that would be great. For more info and question clarity please specify in the comment section.
Upvotes: 0
Views: 1271
Reputation: 153
consider a simple example where given set of values is {1,8,9,4,2}
step1:first sort the given values in ascending order i.e {1,2,4,8,9}
step2:let n be the total no of values in set and q be the quantile. calculate temp = (n-1)*q. say here q=0.3(30%) and n=5,so temp = (5-1)*0.3 = 1.2.
step3:you need to check the indices floor(temp) and ceil(temp) in the sorted set.Note that indexing of elements start from 0.
step4:so here consider values with index 1 and 2,i.e 2 and 4 Now in quantile function provided by pandas,there are various interpolations like linear,lower,higher,etc. For linear you can compute the quantile value by doing i+(j-i)*q.Where i is the no with index floor(temp) and j is the no with index ceil(temp). so here i=2,j=4. so quantile = 2+(4-2)*0.3 = 2.6
Upvotes: 1