suri1617
suri1617

Reputation: 55

how to make bins by different values in a variable in python?

I have a data set like below and i want to make them into different bins by using the values of smstext

bindata

  userid      smstext
0 vodafone     56
1 airtel       101
2 reliance     505
3 tata         1500
4 mts          10

What i need is if sms text value is between 0-10 the binname should be 10, if sms text value is between 11-50 the bin name should be 50, if sms text value is between 51-100 the binname should be 100, if sms text value is between 101-500 the binname should be 500, if sms text value is between 500-1000 the binname should be 1000, if sms text value is above 1000 the binname should be 1001.

Expected output:

bindata

  userid      smstext   bin
0 vodafone     56       100
1 airtel       101      500
2 reliance     505      1000
3 tata         1500     1001
4 mts          10        10

I can solve by using np.where and np.logical but i need a simple way to do the above in python.please help me on this.....

Upvotes: 0

Views: 490

Answers (3)

Stuart
Stuart

Reputation: 9858

I'm new to pandas but it seems you want the cut function.

smstext = np.array([56, 101, 505, 1500, 10])
bins = pd.cut(smstext, [0, 11, 51, 101, 501, 1000, float('inf')], 
    right=False, labels=[10, 50, 100, 500, 1000, 1001])

This returns

  100
  500
 1000
 1001
   10

If for some reason you wanted to write this function yourself rather than using pandas, it would look something like this:

def cut(iter, bins):
    def categorise(item):
        for right in bins:
            if item < right:
                return right - 1
        return bins[-1]

    return [categorise(item) for item in iter]

print(cut(smstext, [0, 11, 51, 101, 501, 1001]))

Upvotes: 0

bbrame
bbrame

Reputation: 18544

Take a look at itertools.groupby.

import itertools

for dataInGroup, group in itertools.groupby(dataToBeGrouped, grouperFunction):
  print group, dataInGroup

groupby takes a function that determines the group of a data item and then returns an iterator that loops through each group label and the items in that group.

Upvotes: 0

kdopen
kdopen

Reputation: 8215

The code to convert one value of smstext to its bin would be

def convert_to_bin(v, bins, other):
    for b in bins:
        if v <= b:
            return b

    return other

And could be called (for your values) as

convert_to_bin(somevalue, [10, 50, 100, 1000], 1001)

Some examples:

In [5]: convert_to_bin(1, [10, 50, 100, 1000], 1001)
Out[5]: 10

In [6]: convert_to_bin(51, [10, 50, 100, 1000], 1001)
Out[6]: 100

In [7]: convert_to_bin(31, [10, 50, 100, 1000], 1001)
Out[7]: 50

In [8]: convert_to_bin(2031, [10, 50, 100, 1000], 1001)
Out[8]: 1001

Then you just have to add the results to the dataset.

Upvotes: 1

Related Questions