abolotnov
abolotnov

Reputation: 4332

Feeding vectorized data to keras

I am working on using some name:gender data to build and train a model that could predict the gender. I am trying the basics as I read about ML and probably got many things wrong. I haven't yet learnt how to generate and feed all the features that I want the network to use in its training. At this point, I am trying to prepare my data and have keras accept it for training.

I am trying to build a dictionary or chars in the names and feed each vectorized name into the model:

names_frame = pd.DataFrame(list(cm.Name.objects.all().values())).drop('id', axis=1)
names_frame['name'] = names_frame['name'].str.lower()
names_frame['gender'] = names_frame['gender'].replace('Male',0).replace('Female', 1)
names_list = names_frame['name'].values
names_dict = list(enumerate(set(list(reduce(lambda x, y: x + y, names_list)))))
names_frame['vectorized'] = names_frame['name'].apply(vectorize, args=(names_dict,))
names_frame.sample()

I end up with this:

       gender   gender_count  name   vectorized
20129  1        276           meena  [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...

Then I build the model and try to train it:

X = names_frame['vectorized']
Y = names_frame['gender']
model = Sequential()
model.add(Dense(32, input_dim=1, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)

And get the following exception:

ValueError: setting an array element with a sequence.

Both names_frame['gender'].shape and names_frame['vectorized'].shape are (34325,)

Basically, I am trying to feed it the vector and the gender classifier, but looks like something is not right with the input format? X is pandas.Series - I tried converting it to np.array but this didn't help.

The input_dim parameter denotes the number of input elements I am giving the network to deal with. I have 1 since I am trying to give it an array of values. Should I be giving it 26? But when I change it to 26, it's giving me a different exception:

ValueError: Error when checking input: expected dense_46_input to have shape (26,) but got array with shape (1,)

This is probably because I am not giving it 26 individual pandas columns I assume - do I need to convert my array to columns or unpack the array somehow?

Upvotes: 1

Views: 507

Answers (1)

keineahnung2345
keineahnung2345

Reputation: 2701

A simple example:

from keras.models import Sequential
from keras.layers import Dense
import pandas as pd
import numpy as np

df = pd.DataFrame({"vectorized": [[1,0,0],[0,1,0],[0,0,1]],
                   "gender": [1,0,1]})

# convert the inner list to numpy array
# X = np.array([np.array(l) for l in df["vectorized"]])
# or use a simpler way:
X = np.vstack(df["vectorized"])
Y = df["gender"].values

model = Sequential()
# input_dim should be X.shape[1]
model.add(Dense(32, input_dim=3, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)

Upvotes: 1

Related Questions