Mahad_Akbar
Mahad_Akbar

Reputation: 13

How to train an unlabled data in machine learning?

I have the data of almost 9000 entities, I want to train my model and detect anomalies from data.

I tried few of things to did my work, and one thing I done is

def create_sequences(values, time_steps=TIME_STEPS):
 output = []
 for i in range(len(values) - time_steps):
     output.append(values[i : (i + time_steps)])
 return np.stack(output)

here I start splitting my training data

x_train = create_sequences(data['HR'].values)
x_train = np.expand_dims(x_train,axis=2)

x_train = create_sequences(data['PULSE'].values)
x_train = np.expand_dims(x_train,axis=2)

x_train = create_sequences(data['SpO2'].values)
x_train = np.expand_dims(x_train,axis=2)

x_train = create_sequences(data['ABPDias'].values)
x_train = np.expand_dims(x_train,axis=2)

x_train = create_sequences(data['ABPMean'].values)
x_train = np.expand_dims(x_train,axis=2)

x_train = create_sequences(data['RESP'].values)
x_train = np.expand_dims(x_train,axis=2)

and here is my model for training

model = Sequential()
model.add(Conv1D(filters=32, kernel_size=7, padding="same", strides=2, input_shape=(x_train.shape[1],x_train.shape[2])))
model.add(MaxPooling1D(pool_size=1,padding="valid"))
model.add(Dropout(0.2))
model.add(Conv1D(filters=16, kernel_size=7, padding="same", strides=2))
model.add(LSTM(units=20, return_sequences=True))
model.add(Dropout(0.2))
model.add(Conv1DTranspose(filters=16, kernel_size=7, padding="same",strides=2))
model.add(Conv1D(filters=32, kernel_size=7, padding="same"))
model.add(MaxPooling1D(pool_size=2,padding="valid"))
model.add(Conv1DTranspose(filters=32, kernel_size=7, padding="same",strides=4,activation="relu"))
model.add(Conv1DTranspose(filters=1, kernel_size=7, padding="same"))

model.compile(optimizer="adam", loss="mse")

model.summary()



history = model.fit(
 x_train,
 x_train,
 epochs=150,
 batch_size=128,
 validation_split=0.1
)

But this took a lot of time. What I am missing?, Can anyone guide me?

And one thing more is, should I use train_test_split for unlabeled data?

Upvotes: 0

Views: 281

Answers (2)

Raha Moosavi
Raha Moosavi

Reputation: 547

You use some layer to encode and then decode the data. The technique you applied is supervised machine learning(ML). Since your dataset is unlabeled, you need to employ unsupervised ML approaches. Clustering is a technique for finding patterns in unlabelled data with many dimensions. There are two different approaches to clustering-based anomaly detection. 1- Unsupervised clustering where the anomaly detection model is trained using unlabelled data that consists of both normal as well as attack traffics. 2- Semi-supervised clustering where the model is trained using normal data only to build a profile of normal activity.

Upvotes: 0

tolgayan
tolgayan

Reputation: 138

You cannot make supervised learning without labeled data. It is not preferable to use features both as an input and label. What you are looking for is clustering-based anomaly detection, which falls under the category of unsupervised learning. DBSCAN might be a good choice for this task, which is available in scikit-learn.

Upvotes: 1

Related Questions