Adirtha1704
Adirtha1704

Reputation: 359

Encoding numeric nominal values in machine learning

I am working on a network data based machine learning problem where one of the columns in my dataset is Destination Port where the values are like 30, 80, 1024, etc..

Since the numeric values in this column are not ordinal, how do I transform this column in some way so that I can put it as an input to the machine learning model? The column has about 480 unique ports.

Upvotes: 1

Views: 381

Answers (2)

Reuben
Reuben

Reputation: 477

Since Destination Port is a Nominal feature, it can be encoded either using label encoding or one hot encoding.

Label encoding

Advantage: No increase in dimension
Disadvantage: Can have an ordinal effect on the model

One hot encoding

Advantage: No ordinal effect on model
Disadvantage: Increase in dimension

Upvotes: 1

Sadak
Sadak

Reputation: 911

it's called normalization and the goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.

 # Create x, where x the 'scores' column's values as floats
 x = df[['name_of_ur_column']].values.astype(float)

 # Create a minimum and maximum processor object
 min_max_scaler = preprocessing.MinMaxScaler()

 # Create an object to transform the data to fit minmax processor
 x_scaled = min_max_scaler.fit_transform(x)

or you can use the original format

 # Run the normalizer on the dataframe

 df_normalized = pd.DataFrame(x_scaled)
 normalized_df=(df-df.mean())/df.std()

to use min-max normalization:

 normalized_df=(df-df.min())/(df.max()-df.min())

Upvotes: 2

Related Questions