Reputation: 321
Got this simple exercise where I have to build a NN with the help of Logistic Regression. My dataset is built in this way :
You are given a dataset ("data.h5") containing:
- a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- a test set of m_test images labeled as cat or non-cat
- each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
To show an image from the database the text gives me an example :
# Example of a picture# Examp
index = 25
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:,index]) + ", it's a '" + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture.")
I have 2 questions :
1) I did not understand how this works : str(train_set_y[:,index])
2) The big problem is that , due to site problem , I can not download this database and in order to do the exercise I would like to understand how is it built. Can someone intuitively tell me how it could be structured?
Upvotes: 2
Views: 3004
Reputation: 322
The dataset can be download at this location, (Thanks to Anderson!)
Then build @taurz 's lr_utils function, put into sys.path() any directory, BUT make sure you delete 'datasets/' from train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
str(train_set_y[:,index]) is the label, >> train_set_y , you can see all the feature, train_set_y.shape = (1,209), train_set_y[:,25][0] = 1 , that means it's cat.
Upvotes: 0
Reputation: 194
I assume this code snippet is from the Coursera Deep Learning Course 1.
"train_set_y" is a vector of shape (1, 209) i.e it has the labels 0 or 1 for all 209 training examples, "train_set_y[:,25]" gives an integer label 0 or 1 from the 25th position of the vector train_set_y. As we are concatenating the strings ("y = " + str(train_set_y[:,index])). we need to convert it into string using str.
Check the lr_utils.py file in the notebook it will give you an clear idea how the dataset is loaded and transformed.
Below is the code snippet from the lr_utils.py file
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
Upvotes: 5