Kamal Pandey
Kamal Pandey

Reputation: 300

how does the quantile() function from pandas work in python?

I came across a function called quantile() in pandas. Can somebody help me explain how this function works and what it does? An example will be extremely appreciated. I am writing a sample code to help you better understand this function

Code i have so far:

def get_quantile_based_buckets(feature_values, num_buckets):
    quantiles = feature_values.quantile([(i+1.)/(num_buckets+1.) for i in list(range(num_buckets))])
    print(quantiles)
    return [quantiles[q] for q in quantiles.keys()]

here feature_values is a pandas DataFrame(). Here is an example to explain this function:

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
dtype: float64

>>> df.quantile([.1, .5])
       a     b
0.1  1.3   3.7
0.5  2.5  55.0

If someone could explain the above example, that would be great. For more info and question clarity please specify in the comment section.

Upvotes: 0

Views: 1271

Answers (1)

Mihir Sheth
Mihir Sheth

Reputation: 153

consider a simple example where given set of values is {1,8,9,4,2}

step1:first sort the given values in ascending order i.e {1,2,4,8,9}

step2:let n be the total no of values in set and q be the quantile. calculate temp = (n-1)*q. say here q=0.3(30%) and n=5,so temp = (5-1)*0.3 = 1.2.

step3:you need to check the indices floor(temp) and ceil(temp) in the sorted set.Note that indexing of elements start from 0.

step4:so here consider values with index 1 and 2,i.e 2 and 4 Now in quantile function provided by pandas,there are various interpolations like linear,lower,higher,etc. For linear you can compute the quantile value by doing i+(j-i)*q.Where i is the no with index floor(temp) and j is the no with index ceil(temp). so here i=2,j=4. so quantile = 2+(4-2)*0.3 = 2.6

Upvotes: 1

Related Questions