4. MNIST example

In this chapter we will show how to use caffe to define and train the LeNet network to solve digits recognition problem.

Fetch datasets

Run get_mnist.sh to download data from MNIST website, and create_mnist.sh to convert data to lmdb format.

cd ~/src/caffe

Define the network

In caffe, we define the network using Google Protobuf, which is used for serializing structured data automatically. We need to write the definition in a .prototxt file. The sample one can be found in ~/src/caffe/examples/mnist/lenet_train_test.prototxt.

  • Open an empty file, and name it lenet_train_test.prototxt
  • Give the network a name.

    name: "LeNet"
  • Define bottom layer, it loads data from lmdb and is called data layer.

    layer {
       name: "mnist"
       type: "Data"
       transform_param {
         scale: 0.00390625
       data_param {
         source: "mnist_train_lmdb"
         backend: LMDB
         batch_size: 64
       top: "data"
       top: "label"

    This layer has type Data, which means it fetches data from files.

    In transform_param, the scale is used to scale the input pixels into the range [0, 1) (0.00390625 is 1 / 256).

    data_param defines where the data are from and how to use them. The source points to the folder where lmdb files are in.

    This layer produce 2 blobs (data & label) to communicate the result in and across layers, which is defined by top.

  • Define the convolution layer.

    layer {
      name: "conv1"
      type: "Convolution"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      convolution_param {
        num_output: 20
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "xavier"
        bias_filler {
          type: "constant"
      bottom: "data"
      top: "conv1"

    This layer has type Convolution.

    The lr_mult in param sets the multiplying factor of learning rate for weights and biases with respect to the learning rate given by the solver during runtime.

    Set the property for convolution layer in convolution_param, num_output is the filter number (feature number). Do the convolution with kernel size 5x5, move the kernel by 1 pixel in each step, initialize the shared weights using the xarier algorithm, and initialize the biases with the default constant 0.

    This layer receives inputs from blob data, and produces output blob conv1.

  • Define the pooling layer.

    The pooling layer comes after the convolution layer.

    layer {
      name: "pool1"
      type: "Pooling"
      pooling_param {
        kernel_size: 2
        stride: 2
        pool: MAX
      bottom: "conv1"
      top: "pool1"

    In pooling_param, it says the pooling uses 2x2 as kernel_size, moves 2 pixels each time (so that the kernels never overlap), and uses max pooling.

  • Define the fully connected layer

    layer {
      name: "ip1"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 500
        weight_filler {
          type: "xavier"
        bias_filler {
          type: "constant"
      bottom: "pool2"
      top: "ip1"

    num_output in inner_product_param means this layer outputs 500 neurons.

  • Define the ReLU layer.

    layer {
      name: "relu1"
      type: "ReLU"
      bottom: "ip1"
      top: "ip1"

    ReLU is an element-wise operation, we can do in-place operations to save memory. So we set the output blob the same as the input blob.

  • Define the second innerproduct layer

    layer {
      name: "ip2"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "xavier"
        bias_filler {
          type: "constant"
      bottom: "ip1"
      top: "ip2"

    There is nothing new in this layer.

  • Define the loss layer.

    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "ip2"
      bottom: "label"

    Note that this is the last layer, so it produces no output, rather it takes two input blobs, one being from the previous layer, the other from the first layer (which reads data from lmdb).

Out network definition has been done. In case you need to specify more option, you may take a look at the file ~/src/caffe/src/caffe/proto/caffe.proto.

Define MNIST solver

Write this in another file, and name it lenet_solver.prototxt. The sample file can be found in ~/src/caffe/examples/mnist/lenet_solver.prototxt.

  • Specify where the network definition file is. Here we use the example file, but you may also use the file you just wrote.

    net: "~/src/examples/mnist/lenet_train_test.prototxt"
  • test_iter specifies how many forward passes the test should carry out. In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.

    test_iter: 100
  • Carry out testing every 500 training iterations.

    test_interval: 500
  • The base learning rate, momentum and the weight decay of the network.

    base_lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
  • The learning rate policy

    lr_policy: "inv"
    gamma: 0.0001
    power: 0.75
  • Display every 100 iterations

    display: 100
  • The maximum number of iterations

    max_iter: 10000
  • Snapshot intermediate results

    snapshot: 5000
    snapshot_prefix: "examples/mnist/lenet"
  • Solver mode: CPU or GPU, if you have installed CUDA, you should specify GPU mode.

    solver_mode: CPU

Training and testing the model

  • Start training.

    Make sure you are in the root folder of caffe.

    ./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt

    Start the caffe program with solver file specified. Here we still use the sample solver file provided along with caffe source codes.

    After lots of messages fly by, you will see this. The accuracy is 0.9908.

