Reputation: 143
I have a dataset that has different attributes. One of these attributes is temperature. My temperature range is from about -30 to about 30 degrees. I want to do a machine learning study and I wanted to group the temperature into different groups. On a principle: below -30: 0, -30 to -10: 1 and so on. I wrote the code below, but it doesn't work the way I want it to. The data type is: int32, I converted it with float64.
dane = [treningowy_df]
for zbior in dane:
zbior['temperatura'] = zbior['temperatura'].astype(int)
zbior.loc[ zbior['temperatura'] <= -30, 'temperatura'] = 0
zbior.loc[(zbior['temperatura'] > -30) & (zbior['temperatura'] <= -10), 'temperatura'] = 1
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
zbior.loc[(zbior['temperatura'] > 10) & (zbior['temperatura'] <= 20), 'temperatura'] = 4
zbior.loc[(zbior['temperatura'] > 20) & (zbior['temperatura'] <= 30), 'temperatura'] = 5
zbior.loc[ zbior['temperatura'] > 30, 'temperatura'] = 6
For example: before the code is executed, record 1 has a temperature: -3, and after the code is applied, record 1 has a temperature: 3. why? A record with a temperature before a change: 22 after the change: 5, i.e. the assignment was executed correctly.
Upvotes: 1
Views: 81
Reputation: 488
it looks like you're manipulating a dataframe. have you tried using the apply function?
Personally I would go about this as such (in fact, with a new column).
1. Write a function to process the value
def _check_temperature_range(x):
if x <= -30:
return 0
elif x <= -10:
return 1
# so on and so forth...
2. Apply the function onto the column of the dataframe
df[new_column] = df[column].apply(lambda x: _check_temperature_range(x))
The results should then be reflected in the new_column
or old column should you use back the same column
Upvotes: 4
Reputation: 5088
If zbior is a pandas.DataFrame, you can use the map function
def my_func(x):
if x <= -30:
return 0
elif x <= -10:
return 1
elif x <= 0:
return 2
elif x <= 10:
return 3
elif x <= 20:
return 4
elif x <= 30:
return 5
else:
return 6
zbior.temperatura=zbior.temperatura.map(my_func)
Upvotes: 2
Reputation: 2858
I believe it has to do with the sequence of your code.
A record with temperature -3, gets assigned as 2 -
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
Then in the next line, it is found again as being between 0 and 10, and so assigned again as 3 -
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
One solution is to assign a number that doesn't make you "jump" a category.
So, for -3, I'd assign 0 so it sticks around.
After that you can do another pass, and change to the actual numbers you wanted, eg 0->3 etc.
Upvotes: 2
Reputation: 31
I think your code is applying multiple times on the same row. With you're exemple with the first line : temp = -3 gives 2 but then temp = 2 gives 3
So I recommend to create a new column in your dataframe
Upvotes: 2