Jakub Bidziński
Jakub Bidziński

Reputation: 143

Python - problem with changing values to groups

I have a dataset that has different attributes. One of these attributes is temperature. My temperature range is from about -30 to about 30 degrees. I want to do a machine learning study and I wanted to group the temperature into different groups. On a principle: below -30: 0, -30 to -10: 1 and so on. I wrote the code below, but it doesn't work the way I want it to. The data type is: int32, I converted it with float64.

dane = [treningowy_df]
for zbior in dane:
    zbior['temperatura'] = zbior['temperatura'].astype(int)
    zbior.loc[ zbior['temperatura'] <= -30, 'temperatura'] = 0
    zbior.loc[(zbior['temperatura'] > -30) & (zbior['temperatura'] <= -10), 'temperatura'] = 1
    zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
    zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
    zbior.loc[(zbior['temperatura'] > 10) & (zbior['temperatura'] <= 20), 'temperatura'] = 4
    zbior.loc[(zbior['temperatura'] > 20) & (zbior['temperatura'] <= 30), 'temperatura'] = 5
    zbior.loc[ zbior['temperatura'] > 30, 'temperatura'] = 6

For example: before the code is executed, record 1 has a temperature: -3, and after the code is applied, record 1 has a temperature: 3. why? A record with a temperature before a change: 22 after the change: 5, i.e. the assignment was executed correctly.

Upvotes: 1

Views: 81

Answers (4)

Gabriel
Gabriel

Reputation: 488

it looks like you're manipulating a dataframe. have you tried using the apply function?

Personally I would go about this as such (in fact, with a new column).

1. Write a function to process the value

def _check_temperature_range(x):
  if x <= -30:
    return 0
  elif x <= -10:
    return 1
  # so on and so forth...

2. Apply the function onto the column of the dataframe

df[new_column] = df[column].apply(lambda x: _check_temperature_range(x))

The results should then be reflected in the new_column or old column should you use back the same column

Upvotes: 4

Hussein Awala
Hussein Awala

Reputation: 5088

If zbior is a pandas.DataFrame, you can use the map function

def my_func(x):
    if x <= -30:
        return 0
    elif x <= -10:
        return 1
    elif x <= 0:
        return 2
    elif x <= 10:
        return 3
    elif x <= 20:
        return 4
    elif x <= 30:
        return 5
    else:
        return 6
zbior.temperatura=zbior.temperatura.map(my_func)

Upvotes: 2

Jay
Jay

Reputation: 2858

I believe it has to do with the sequence of your code.

A record with temperature -3, gets assigned as 2 -

zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2

Then in the next line, it is found again as being between 0 and 10, and so assigned again as 3 -

zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3

One solution is to assign a number that doesn't make you "jump" a category.

So, for -3, I'd assign 0 so it sticks around.

After that you can do another pass, and change to the actual numbers you wanted, eg 0->3 etc.

Upvotes: 2

Adrien Lebas
Adrien Lebas

Reputation: 31

I think your code is applying multiple times on the same row. With you're exemple with the first line : temp = -3 gives 2 but then temp = 2 gives 3

So I recommend to create a new column in your dataframe

Upvotes: 2

Related Questions