brow-joe
brow-joe

Reputation: 171

LSTM Error python keras

Good morning, I'm trying to train lstm to classify spam and not spam, I came across the following error:

ValueError: Input 0 is incompatible with layer lstm_1: expected ndim = 3, found ndim = 4

Can someone help me understand where the problem is?

my code:

import sys
import pandas as pd
import numpy as np
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.feature_extraction.text import CountVectorizer

if __name__ == "__main__":
    np.random.seed(7)

    with open('SMSSpamCollection') as file:
        dataset = [[x.split('\t')[0],x.split('\t')[1]] for x in [line.strip() for line in file]]

    data   = np.array([dat[1] for dat in dataset])
    labels = np.array([dat[0] for dat in dataset])

    dataVectorizer = CountVectorizer(analyzer = "word",  
                             tokenizer = None,   
                             preprocessor = None,
                             stop_words = None,  
                             max_features = 5000) 
    labelVectorizer = CountVectorizer(analyzer = "word",  
                             tokenizer = None,   
                             preprocessor = None,
                             stop_words = None,  
                             max_features = 5000) 

    data = dataVectorizer.fit_transform(data).toarray()
    labels = labelVectorizer.fit_transform(labels).toarray()
    vocab = labelVectorizer.get_feature_names()

    print(vocab)
    print(data)
    print(labels)

    data = np.reshape(data, (data.shape[0], 1, data.shape[1]))

    input_dim = data.shape
    tam = len(data[0])

    print(data.shape)
    print(tam)

    model = Sequential()
    model.add(LSTM(tam, input_shape=input_dim))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(data, labels, epochs=100, batch_size=1, verbose=2)

I tried adding another position in the data array but also with no result my file SMSSpamCollection

ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
ham Ok lar... Joking wif u oni...
spam    Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
ham U dun say so early hor... U c already then say...
ham Nah I don't think he goes to usf, he lives around here though
spam    FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
ham Even my brother is not like to speak with me. They treat me like aids patent.
...

thanks

Upvotes: 1

Views: 230

Answers (1)

Marcin Możejko
Marcin Możejko

Reputation: 40506

The problem lies in fact that you are adding an additional dimension connected with samples. Try:

input_dim = (data.shape[1], data.shape[2])

This should work.

Upvotes: 1

Related Questions