Rumesa
Rumesa

Reputation: 103

How do I use keras to load my own customized dataset for convolution neural network

Below is sample code for imdb dataset.I am a beginner and following a tutorial, I am trying to load my own dataset in keras.How would I modify the code.I would be very grateful

import keras
#Using keras to load the dataset with the top_words
max_features = 10000 #max number of words to include, words are ranked by how often they occur (in training set)
max_review_length = 1600

(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features)
print 'loaded dataset...'
#Pad the sequence to the same length
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

index_dict = keras.datasets.imdb.get_word_index()

Upvotes: 1

Views: 2766

Answers (1)

Danny Friar
Danny Friar

Reputation: 393

Here's a simple solution with Pandas and CountVectorizer. You'll then need to pad the data and split into test and train as above.

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

data = {
    'label': [0, 1, 0, 1],
    'text': ['first bit of text', 'second bit of text', 'third text', 'text number four']
}
data = pd.DataFrame.from_dict(data)

# Form vocab dictionary
vectorizer = CountVectorizer()
vectorizer.fit_transform(data['text'].tolist())
vocab_text = vectorizer.vocabulary_

# Convert text
def convert_text(text):
    text_list = text.split(' ')
    return [vocab_text[t]+1 for t in text_list]

data['text'] = data['text'].apply(convert_text)

# Get X and y matrices
y = np.array(data['label'])
X = np.array(data['text'])

Upvotes: 1

Related Questions