Reputation: 2825
I have vector of integers representing each character in the domain name and another vector of integers representing the timeline information. I need to give both these vectors as input to a CNN model to classify domain names as good or spam.
For instance,
Vector representing domain name -> 1 x 75 vector. Each element in the vector represents each character in the domain name. If there are 1000 domain names, then it will be a matrix of shape 1000 x 75
Vector representing timeline information -> 1 x 1440 vector. Each element representing number of mails sent from a particular domain for each minute. If there are 1000 domain names, then it will be a matrix of shape 1000 x 1440
How do I input these two vectors to a single CNN model?
My current model is given only the domain name as input,
def build_model(max_features, maxlen):
"""Build CNN model"""
model = Sequential()
model.add(Embedding(max_features, 8, input_length=maxlen))
model.add(Convolution1D(6, 4, border_mode='same'))
model.add(Convolution1D(4, 4, border_mode='same'))
model.add(Convolution1D(2, 4, border_mode='same'))
model.add(Flatten())
#model.add(Dropout(0.2))
#model.add(Dense(2,activation='sigmoid'))
#model.add(Dense(180,activation='sigmoid'))
#model.add(Dropout(0.2))
model.add(Dense(2,activation='softmax'))
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy', 'f1score', 'precision', 'recall'])
Thanks!
Upvotes: 5
Views: 7656
Reputation: 86600
In convolutions, you need a "length" dimension and a "channels" dimension.
(In 2D, they would be "width", "height" and "channels").
Now, I can't think of any way to relate the 75 characters with the 1440 minutes. (Maybe you can, and if you can state how, maybe we can work better)
Here is what I'm assuming:
So, we'd have two inputs:
from keras.layers import *
input1 = Input((75,))
input2 = Input((1440,))
Only the domain name should pass through an embedding layer:
name = Embedding(max_features, 8, input_length=maxlen)(input1)
Now, reshaping to fit the convolutional inputs (None,length,channels)
.
# the embedding output is already (Batch, 75, 8) -- See: https://keras.io/layers/embeddings/
mails = Reshape((1440,1))(input2) #adding 1 channel at the end
Parallel convolutions:
name = Conv1D( feel free to customize )(name)
name = Conv1D( feel free to customize )(name)
mails = Conv1D( feel free to customize )(mails)
mails = Conv1D( feel free to customize )(mails)
Concatenate - Since they have totally different shapes, maybe we should simply flatten both (or you could think of fancy operations to match them)
name = Flatten()(name)
mails = Flatten()(mails)
out = Concatenate()([name,mails])
out = add your extra layers
out = Dense(2,activation='softmax')(out)
And finally we create the model:
from keras.models import Model
model = Model([input1,input2], out)
Train it like this:
model.fit([xName,xMails], Y, ....)
Upvotes: 5
Reputation: 831
You could build a multi-input network using Keras' functional API. Have a 1D convolution network separately for each input dimensions. Then concatenate the output of each of these networks and pass that concatenated vector into some shared fully-connected layers which sit on top of both of the other networks.
https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models
Upvotes: 2