Classifying grayscale images of handwritten digits (MNIST) using Keras

7 minute read

Published:

In this post, I will quickly solve the classification problem of grayscale images of handwritten digits using Keras. This data set, also called MNIST, is classic. It contains tenths of thousand of handwritten numbers (which are 28x28 pixels), and the classification task is to categorize them into their ten categories: 0 to 9.

The dataset contains 60K training images (which we will further separate to train and dev sets) plus 10K test images. The data were gathered by the National Institute of Standards and Technology about 30-40 years ago. To the deep-learning community solving the MNIST problem is like printing “Hello World” as is custom for any computer-science major.

The MNIST dataset comes pre-loaded with Keras as a set of Numpy arrays.

import keras
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(train_images.shape)
print(train_labels.shape)
print(test_images.shape)
print(test_labels.shape)
/Users/nc374/anaconda/lib/python3.5/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)

train_images and train_labels come from the “training set,” the data that our model will learn from. Later on, we will split the train set into train and dev sets. Our model will be tested against the “test set,” containing test_images and test_labels. Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging from 0 to 9. There is, of course, a direct correspondence between the images and the labels.

Let us look at a few of the images and their labels.

import matplotlib.pyplot as plt

plt.figure(num=None, figsize=(20, 10), dpi=80, facecolor='w', edgecolor='k')

plt.subplot((141))
img=train_images[10,:,:]
plt.imshow(img, cmap=plt.cm.binary)
plt.title("True label is "+str(train_labels[10,]),fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)

plt.subplot((142))
img=train_images[100,:,:]
plt.imshow(img, cmap=plt.cm.binary)
plt.title("True label is "+str(train_labels[100,]),fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)

plt.subplot((143))
img=train_images[230,:,:]
plt.imshow(img, cmap=plt.cm.binary)
plt.title("True label is "+str(train_labels[230,]),fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)

plt.subplot((144))
img=train_images[4010,:,:]
plt.imshow(img, cmap=plt.cm.binary)
plt.title("True label is "+str(train_labels[4010,]),fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)

plt.show()

Next, we build a simple one layer neural network that consists of a sequence of three Dense layers. The first is densely-connected neural layers, the second is dropout layer, and the third is a 10-way “softmax” layer, which means it will return an array of ten probability scores were each score would be the probability that the current digit image belongs to one of the ten digit classes.

from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam', #An optimizer
                loss='categorical_crossentropy', #A loss function
                metrics=['accuracy']) #Metric to monitor during training and testing

Next, we pre-process the data by reshaping it into the shape that the network expects and scale it so that all values are in the [0, 1] interval.

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Categorically encode the labels

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Seperate the train data to ‘train’ and ‘dev’ dets

val_train_images = train_images[:10000,:]
partial_train_images = train_images[10000:,:]

val_train_labels = train_labels[:10000,:]
partial_train_labels = train_labels[10000:,:]

print(val_train_images.shape)
print(partial_train_images.shape)
print(val_train_labels.shape)
print(partial_train_labels.shape)
(10000, 784)
(50000, 784)
(10000, 10)
(50000, 10)

Let us train the model: we “fit” the model to its training data.

history = model.fit(partial_train_images, 
                    partial_train_labels, 
                    epochs=20, # 20 iterations over the traning data
                    batch_size=128,  # each time with a batch of 128 training examples
                    validation_data=(val_train_images, val_train_labels))
Train on 50000 samples, validate on 10000 samples
Epoch 1/20
50000/50000 [==============================] - 6s 124us/step - loss: 0.3615 - acc: 0.8938 - val_loss: 0.1703 - val_acc: 0.9515
Epoch 2/20
50000/50000 [==============================] - 7s 131us/step - loss: 0.1724 - acc: 0.9493 - val_loss: 0.1233 - val_acc: 0.9646
Epoch 3/20
50000/50000 [==============================] - 6s 125us/step - loss: 0.1288 - acc: 0.9621 - val_loss: 0.0989 - val_acc: 0.9705
Epoch 4/20
50000/50000 [==============================] - 6s 130us/step - loss: 0.1030 - acc: 0.9695 - val_loss: 0.0886 - val_acc: 0.9740
Epoch 5/20
50000/50000 [==============================] - 6s 128us/step - loss: 0.0869 - acc: 0.9736 - val_loss: 0.0822 - val_acc: 0.9766
Epoch 6/20
50000/50000 [==============================] - 7s 132us/step - loss: 0.0758 - acc: 0.9766 - val_loss: 0.0766 - val_acc: 0.9776
Epoch 7/20
50000/50000 [==============================] - 7s 136us/step - loss: 0.0673 - acc: 0.9788 - val_loss: 0.0713 - val_acc: 0.9788
Epoch 8/20
50000/50000 [==============================] - 6s 128us/step - loss: 0.0609 - acc: 0.9807 - val_loss: 0.0686 - val_acc: 0.9787
Epoch 9/20
50000/50000 [==============================] - 7s 132us/step - loss: 0.0528 - acc: 0.9824 - val_loss: 0.0674 - val_acc: 0.9801
Epoch 10/20
50000/50000 [==============================] - 7s 132us/step - loss: 0.0492 - acc: 0.9844 - val_loss: 0.0669 - val_acc: 0.9802
Epoch 11/20
50000/50000 [==============================] - 6s 127us/step - loss: 0.0452 - acc: 0.9853 - val_loss: 0.0612 - val_acc: 0.9811
Epoch 12/20
50000/50000 [==============================] - 7s 133us/step - loss: 0.0409 - acc: 0.9866 - val_loss: 0.0658 - val_acc: 0.9807
Epoch 13/20
50000/50000 [==============================] - 7s 131us/step - loss: 0.0416 - acc: 0.9859 - val_loss: 0.0649 - val_acc: 0.9807
Epoch 14/20
50000/50000 [==============================] - 7s 142us/step - loss: 0.0356 - acc: 0.9886 - val_loss: 0.0622 - val_acc: 0.9823
Epoch 15/20
50000/50000 [==============================] - 7s 140us/step - loss: 0.0344 - acc: 0.9887 - val_loss: 0.0639 - val_acc: 0.9821
Epoch 16/20
50000/50000 [==============================] - 7s 144us/step - loss: 0.0328 - acc: 0.9890 - val_loss: 0.0677 - val_acc: 0.9817
Epoch 17/20
50000/50000 [==============================] - 7s 133us/step - loss: 0.0307 - acc: 0.9895 - val_loss: 0.0636 - val_acc: 0.9837
Epoch 18/20
50000/50000 [==============================] - 7s 138us/step - loss: 0.0291 - acc: 0.9899 - val_loss: 0.0656 - val_acc: 0.9826
Epoch 19/20
50000/50000 [==============================] - 6s 128us/step - loss: 0.0269 - acc: 0.9910 - val_loss: 0.0600 - val_acc: 0.9832
Epoch 20/20
50000/50000 [==============================] - 8s 154us/step - loss: 0.0260 - acc: 0.9913 - val_loss: 0.0684 - val_acc: 0.9814

The results are stored in ‘history.’ Two quantities are being displayed during training: the “loss” of the model over the training data, and the accuracy of the model over the training data.

Right after one iteration we reached an accuracy of 89% and 99% by the 19th. Now let’s check that our model performs well on the test set too:

history_dict = history.history
history_dict.keys()
test_loss, test_acc = model.evaluate(test_images, test_labels)
10000/10000 [==============================] - 1s 75us/step
print('test_acc:', test_acc)
test_acc: 0.9836

Lastly, we print the loss and accuracy of the train and dev sets as a function of the epoch.

import matplotlib.pyplot as plt

plt.figure(figsize=(20,10))

plt.subplot(121)
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
plt.title('Training and validation loss',fontsize=20)
plt.xlabel('Epochs',fontsize=20)
plt.ylabel('Loss',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)


plt.subplot(122)
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, acc_values, 'bo', label='Training acc')
plt.plot(epochs, val_acc_values, 'b', label='Validation acc')
plt.title('Training and validation accuracy',fontsize=20)
plt.xlabel('Epochs',fontsize=20)
plt.ylabel('Loss',fontsize=20)
plt.legend(fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.show()

For the training data, the loss goes down with the epoch number which is not surprising. It somewhat saturates for the dev set after and about the ten epoch but does not go up which means that we are not overfitting. The accuracy goes up with the epoch, sharply for the train data but also for the dev data which means that the model keeps on getting better and better on the supervised task as it goes on and on over the training and dev examples.

Leave a Comment