Reputation: 793
I've a dataframe df
:
A
1 12
2 15.5
3 20.5
4 30.5
5 15
x_range = [list(range(0,5)),list(range(6,10)),list(range(11,15)),list(range(15,20)),list(range(21,25))]
def min_max_range(x,y):
for a in y:
if int(x) in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
The resulant should look like this:
df['A'].apply(lambda x: min_max_range(x,x_range))
A B
1 12 5
2 15 5
3 20.5 4
4 5.5 4
5 15.5 4
But what i'm getting is this:
A B
1 12 4
2 15 5
3 20.5 NA
4 5.5 NA
5 15.5 NA
I know why it is happening, its not considering the values between, range(0,5) = [0,1,2,3,4,5]
and range(6,10) = [6,7,8,9,10]
, now it's not considering the values between 5 & 6
. If there's a value 5.5 or 5.8
. then it won't consider it and returnNA
. How can i avoid this?
Upvotes: 2
Views: 249
Reputation: 863166
It seems problem is last value, it is not 5
but 4
, so in ranges last value missing:
print (list(range(0,5)))
[0, 1, 2, 3, 4]
print (list(range(6,10)))
[6, 7, 8, 9]
print (list(range(11,15)))
[11, 12, 13, 14]
I think is necessary add one value to second integers in ranges like:
print (list(range(0,6)))
[0, 1, 2, 3, 4, 5]
print (list(range(6,11)))
[6, 7, 8, 9, 10]
print (list(range(11,16)))
[11, 12, 13, 14, 15]
After changed values there is no NaN
s:
x_range = [list(range(0,6)),list(range(6,11)),list(range(11,16)),
list(range(16,21)),list(range(21,26))]
def min_max_range(x,y):
for a in y:
if int(x) in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
df['B'] = df['A'].apply(lambda x: min_max_range(x,x_range))
print (df)
A B
1 12.0 5
2 15.0 5
3 20.5 6
4 5.5 6
5 15.5 5
Upvotes: 2