Reputation: 359
I am working on a network data based machine learning problem where one of the columns in my dataset is Destination Port
where the values are like 30, 80, 1024, etc.
.
Since the numeric values in this column are not ordinal, how do I transform this column in some way so that I can put it as an input to the machine learning model? The column has about 480 unique ports.
Upvotes: 1
Views: 381
Reputation: 477
Since Destination Port
is a Nominal feature, it can be encoded either using label encoding or one hot encoding.
Label encoding
Advantage: No increase in dimension
Disadvantage: Can have an ordinal effect on the model
One hot encoding
Advantage: No ordinal effect on model
Disadvantage: Increase in dimension
Upvotes: 1
Reputation: 911
it's called normalization and the goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.
# Create x, where x the 'scores' column's values as floats
x = df[['name_of_ur_column']].values.astype(float)
# Create a minimum and maximum processor object
min_max_scaler = preprocessing.MinMaxScaler()
# Create an object to transform the data to fit minmax processor
x_scaled = min_max_scaler.fit_transform(x)
or you can use the original format
# Run the normalizer on the dataframe
df_normalized = pd.DataFrame(x_scaled)
normalized_df=(df-df.mean())/df.std()
to use min-max normalization:
normalized_df=(df-df.min())/(df.max()-df.min())
Upvotes: 2