Reputation: 13
I have the data of almost 9000 entities, I want to train my model and detect anomalies from data.
I tried few of things to did my work, and one thing I done is
def create_sequences(values, time_steps=TIME_STEPS):
output = []
for i in range(len(values) - time_steps):
output.append(values[i : (i + time_steps)])
return np.stack(output)
here I start splitting my training data
x_train = create_sequences(data['HR'].values)
x_train = np.expand_dims(x_train,axis=2)
x_train = create_sequences(data['PULSE'].values)
x_train = np.expand_dims(x_train,axis=2)
x_train = create_sequences(data['SpO2'].values)
x_train = np.expand_dims(x_train,axis=2)
x_train = create_sequences(data['ABPDias'].values)
x_train = np.expand_dims(x_train,axis=2)
x_train = create_sequences(data['ABPMean'].values)
x_train = np.expand_dims(x_train,axis=2)
x_train = create_sequences(data['RESP'].values)
x_train = np.expand_dims(x_train,axis=2)
and here is my model for training
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=7, padding="same", strides=2, input_shape=(x_train.shape[1],x_train.shape[2])))
model.add(MaxPooling1D(pool_size=1,padding="valid"))
model.add(Dropout(0.2))
model.add(Conv1D(filters=16, kernel_size=7, padding="same", strides=2))
model.add(LSTM(units=20, return_sequences=True))
model.add(Dropout(0.2))
model.add(Conv1DTranspose(filters=16, kernel_size=7, padding="same",strides=2))
model.add(Conv1D(filters=32, kernel_size=7, padding="same"))
model.add(MaxPooling1D(pool_size=2,padding="valid"))
model.add(Conv1DTranspose(filters=32, kernel_size=7, padding="same",strides=4,activation="relu"))
model.add(Conv1DTranspose(filters=1, kernel_size=7, padding="same"))
model.compile(optimizer="adam", loss="mse")
model.summary()
history = model.fit(
x_train,
x_train,
epochs=150,
batch_size=128,
validation_split=0.1
)
But this took a lot of time. What I am missing?, Can anyone guide me?
And one thing more is, should I use train_test_split
for unlabeled data?
Upvotes: 0
Views: 281
Reputation: 547
You use some layer to encode and then decode the data. The technique you applied is supervised machine learning(ML). Since your dataset is unlabeled, you need to employ unsupervised ML approaches. Clustering is a technique for finding patterns in unlabelled data with many dimensions. There are two different approaches to clustering-based anomaly detection. 1- Unsupervised clustering where the anomaly detection model is trained using unlabelled data that consists of both normal as well as attack traffics. 2- Semi-supervised clustering where the model is trained using normal data only to build a profile of normal activity.
Upvotes: 0
Reputation: 138
You cannot make supervised learning without labeled data. It is not preferable to use features both as an input and label. What you are looking for is clustering-based anomaly detection, which falls under the category of unsupervised learning. DBSCAN might be a good choice for this task, which is available in scikit-learn.
Upvotes: 1