4. MNIST example
In this chapter we will show how to use caffe to define and train the LeNet network to solve digits recognition problem.
Fetch datasets
Run get_mnist.sh
to download data from MNIST website, and create_mnist.sh
to convert data to lmdb format.
cd ~/src/caffe
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
Define the network
In caffe, we define the network using Google Protobuf, which is used for serializing structured data automatically. We need to write the definition in a .prototxt
file. The sample one can be found in ~/src/caffe/examples/mnist/lenet_train_test.prototxt
.
- Open an empty file, and name it
lenet_train_test.prototxt
Give the network a name.
name: "LeNet"
Define bottom layer, it loads data from lmdb and is called data layer.
layer { name: "mnist" type: "Data" transform_param { scale: 0.00390625 } data_param { source: "mnist_train_lmdb" backend: LMDB batch_size: 64 } top: "data" top: "label" }
This layer has type
Data
, which means it fetches data from files.In
transform_param
, thescale
is used to scale the input pixels into the range [0, 1) (0.00390625 is 1 / 256).data_param
defines where the data are from and how to use them. Thesource
points to the folder where lmdb files are in.This layer produce 2 blobs (
data
&label
) to communicate the result in and across layers, which is defined bytop
.Define the convolution layer.
layer { name: "conv1" type: "Convolution" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } bottom: "data" top: "conv1" }
This layer has type
Convolution
.The
lr_mult
inparam
sets the multiplying factor of learning rate for weights and biases with respect to the learning rate given by the solver during runtime.Set the property for convolution layer in
convolution_param
,num_output
is the filter number (feature number). Do the convolution with kernel size 5x5, move the kernel by 1 pixel in each step, initialize the shared weights using thexarier
algorithm, and initialize the biases with the default constant 0.This layer receives inputs from blob
data
, and produces output blobconv1
.Define the pooling layer.
The pooling layer comes after the convolution layer.
layer { name: "pool1" type: "Pooling" pooling_param { kernel_size: 2 stride: 2 pool: MAX } bottom: "conv1" top: "pool1" }
In
pooling_param
, it says the pooling uses 2x2 askernel_size
, moves 2 pixels each time (so that the kernels never overlap), and uses max pooling.Define the fully connected layer
layer { name: "ip1" type: "InnerProduct" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } bottom: "pool2" top: "ip1" }
num_output
ininner_product_param
means this layer outputs 500 neurons.Define the ReLU layer.
layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" }
ReLU is an element-wise operation, we can do in-place operations to save memory. So we set the output blob the same as the input blob.
Define the second innerproduct layer
layer { name: "ip2" type: "InnerProduct" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } bottom: "ip1" top: "ip2" }
There is nothing new in this layer.
Define the loss layer.
layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" }
Note that this is the last layer, so it produces no output, rather it takes two input blobs, one being from the previous layer, the other from the first layer (which reads data from lmdb).
Out network definition has been done. In case you need to specify more option, you may take a look at the file ~/src/caffe/src/caffe/proto/caffe.proto
.
Define MNIST solver
Write this in another file, and name it lenet_solver.prototxt
. The sample file can be found in ~/src/caffe/examples/mnist/lenet_solver.prototxt
.
Specify where the network definition file is. Here we use the example file, but you may also use the file you just wrote.
net: "~/src/examples/mnist/lenet_train_test.prototxt"
test_iter specifies how many forward passes the test should carry out. In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.
test_iter: 100
Carry out testing every 500 training iterations.
test_interval: 500
The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005
The learning rate policy
lr_policy: "inv" gamma: 0.0001 power: 0.75
Display every 100 iterations
display: 100
The maximum number of iterations
max_iter: 10000
Snapshot intermediate results
snapshot: 5000 snapshot_prefix: "examples/mnist/lenet"
Solver mode: CPU or GPU, if you have installed CUDA, you should specify GPU mode.
solver_mode: CPU
Training and testing the model
Start training.
Make sure you are in the root folder of caffe.
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
Start the caffe program with
solver
file specified. Here we still use the sample solver file provided along with caffe source codes.After lots of messages fly by, you will see this.
The accuracy is
0.9908
.