james joyce
james joyce

Reputation: 493

ValueError: `sequences` must be iterable in Keras

I am trying to build a sentiment analysis model but when i start training,i am getting error as ValueError: sequences must be iterable.

pad_sequences is what gives error.

code till the function containing pad_sequences:

1)get the word list,remove any punctuation and convert all words tokens to lowercase:

 def get_processed_tokens(text):
    filtered_text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    filtered_text = filtered_text.split()
    filtered_text = [token.lower() for token in filtered_text]
    return filtered_text

2)Creating token_idx dictionary that maps tokens to integers to create embeddings and filter out the ones that occur less than the threshold which is given as 5 in the training set.

def tokenize_text(data_text, min_frequency =5):
    review_tokens = [get_processed_tokens(review) for review in data_text]
    token_list = [token for review in review_tokens  for token in review] 
    token_freq_dict = {token:token_list.count(token) for token in set(token_list)}
    most_freq_tokens = [tokens for tokens in token_freq_dict if token_freq_dict[tokens] >= min_frequency]
    idx = range(len(most_freq_tokens))
    token_idx = dict(zip(most_freq_tokens, idx))
    return token_idx,len(most_freq_tokens)

3)createing the sequences that will be fed into the model to learn the embeddings,fixed-length sequence of (max_tokens) for each review in the dataset. pre-padding the sequences with zeros if they are less than the maximum length.

def create_sequences(data_text,token_idx,max_tokens):
    review_tokens  = [get_processed_tokens(review) for review in data_text] 
    review_token_idx = map( lambda review: [token_idx[k] for k in review if k in token_idx.keys() ], review_tokens)    
    padded_sequences = pad_sequences(review_token_idx, maxlen=max_tokens)  ##this line gives error
    return np.array(padded_sequences)

Upvotes: 3

Views: 3536

Answers (1)

today
today

Reputation: 33470

The pad_sequences function expects that the given sequences object has a __len__ attribute (i.e. which basically gives the number of sequences). The review_token_idx which is a map object does not have a __len__ attribute. So you need to convert it to an object, e.g. a list, which has such an attribute:

padded_sequences = pad_sequences(list(review_token_idx), maxlen=max_tokens)

Upvotes: 2

Related Questions