NLP for multi feature data set using TensorFlow

Question

I am just a beginner in this subject, I have tested some NN for image recognition as well as using NLP for sequence classification.

This second topic is interesting for me. using

sentences = [
  'some test sentence',
  'and the second sentence'
]
tokenizer = Tokenizer(num_words=100, oov_token='')
tokenizer.fit_on_texts(sentences)
sentences = tokenizer.texts_to_sequences(sentences)

will result with an array of size [n,1] where n is word size in sentence. And assuming I have implemented padding correctly each Training example in set will be size of [n,1] where n is the max sentence length.

that prepared training set I can pass into keras model.fit

what when I have multiple features in my data set? let's say I would like to build an event prioritization algorithm and my data structure would look like:

[event_description, event_category, event_location, label]

trying to tokenize such array would result in [n,m] matrix where n is maximum word length and m is the feature number

how to prepare such a dataset so a model could be trained on such data?

would this approach be ok:

# Going through training set to get all features into specific ararys
for data in dataset:
  training_sentence.append(data['event_description'])
  training_category.append(data['event_category'])
  training_location.append(data['event_location'])
  training_labels.append(data['label'])

# Tokenize each array which contains tokenized value 
tokenizer.fit_on_texts(training_sentence)
tokenizer.fit_on_texts(training_category)
tokenizer.fit_on_texts(training_location)
sequences = tokenizer.texts_to_sequences(training_sentence)
categories = tokenizer.texts_to_sequences(training_category)
locations = tokenizer.texts_to_sequences(training_location)

# Concatenating arrays with features into one
training_example = numpy.concatenate([sequences,categories, locations])

#ommiting model definition, training the model
model.fit(training_example, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))

I haven't been testing it yet. I just want to make sure if I understand everything correctly and if my assumptions are correct.

Is this a correct approach to create NPL using NN?

NLP for multi feature data set using TensorFlow

Answers (1)

Related Questions