Reputation: 55
I have a data set like below and i want to make them into different bins by using the values of smstext
bindata
userid smstext
0 vodafone 56
1 airtel 101
2 reliance 505
3 tata 1500
4 mts 10
What i need is if sms text value is between 0-10 the binname should be 10, if sms text value is between 11-50 the bin name should be 50, if sms text value is between 51-100 the binname should be 100, if sms text value is between 101-500 the binname should be 500, if sms text value is between 500-1000 the binname should be 1000, if sms text value is above 1000 the binname should be 1001.
Expected output:
bindata
userid smstext bin
0 vodafone 56 100
1 airtel 101 500
2 reliance 505 1000
3 tata 1500 1001
4 mts 10 10
I can solve by using np.where and np.logical but i need a simple way to do the above in python.please help me on this.....
Upvotes: 0
Views: 490
Reputation: 9858
I'm new to pandas but it seems you want the cut
function.
smstext = np.array([56, 101, 505, 1500, 10])
bins = pd.cut(smstext, [0, 11, 51, 101, 501, 1000, float('inf')],
right=False, labels=[10, 50, 100, 500, 1000, 1001])
This returns
100
500
1000
1001
10
If for some reason you wanted to write this function yourself rather than using pandas, it would look something like this:
def cut(iter, bins):
def categorise(item):
for right in bins:
if item < right:
return right - 1
return bins[-1]
return [categorise(item) for item in iter]
print(cut(smstext, [0, 11, 51, 101, 501, 1001]))
Upvotes: 0
Reputation: 18544
Take a look at itertools.groupby
.
import itertools
for dataInGroup, group in itertools.groupby(dataToBeGrouped, grouperFunction):
print group, dataInGroup
groupby takes a function that determines the group of a data item and then returns an iterator that loops through each group label and the items in that group.
Upvotes: 0
Reputation: 8215
The code to convert one value of smstext to its bin would be
def convert_to_bin(v, bins, other):
for b in bins:
if v <= b:
return b
return other
And could be called (for your values) as
convert_to_bin(somevalue, [10, 50, 100, 1000], 1001)
Some examples:
In [5]: convert_to_bin(1, [10, 50, 100, 1000], 1001)
Out[5]: 10
In [6]: convert_to_bin(51, [10, 50, 100, 1000], 1001)
Out[6]: 100
In [7]: convert_to_bin(31, [10, 50, 100, 1000], 1001)
Out[7]: 50
In [8]: convert_to_bin(2031, [10, 50, 100, 1000], 1001)
Out[8]: 1001
Then you just have to add the results to the dataset.
Upvotes: 1