A blazing fast CUDA-parallelized library for neural networks and machine learning. Highly optimized while maintaining a flexible API.
The FastNeurons neural network library is centered around training performance above all else. When comparing with a popular sequential LeNet5 C implementation, training and testing over the MNIST handwritten digit datagbase, FastNeurons manages a speedup of more than 100x.
Below you’ll find a comparison of two LeNet5-esque neural network training sequences (using MNIST), showing first FastNeurons, then an equivalent sequential CPU implementation available here.
Parallel version:
Train average 1: 3.751 seconds
Train average 2: 4.073 seconds
Train average 3: 3.732 seconds
Train average 4: 3.688 seconds
Test time 1: 1.062 seconds
Test time 2: 1.121 seconds
Test time 3: 0.915 seconds
Test time 4: 0.945 seconds
Predicted label
0 1 2 3 4 5 6 7 8 9 Total
----------------------------------------------------------------------
True label
0 970 0 2 0 0 0 4 1 3 0 980
1 0 1125 0 4 0 0 3 0 3 0 1135
2 11 7 969 9 5 1 5 10 15 0 1032
3 2 0 4 970 1 13 0 11 6 3 1010
4 0 0 1 1 958 0 8 0 2 12 982
5 6 1 0 9 1 862 2 1 6 4 892
6 9 3 0 1 4 5 931 0 5 0 958
7 2 12 17 4 0 1 0 950 2 40 1028
8 10 4 1 7 10 3 2 7 915 15 974
9 10 6 0 5 17 7 0 4 7 953 1009
----------------------------------------------------------------------
Total number of input images tested = 10000
----------------------------------------------------------------------
Correct guesses: 9603 / 10000 (96.0300%)
Sequential version:
Total 1: 424 seconds
Total 2: 420.54 seconds
Predicted label
0 1 2 3 4 5 6 7 8 9 Total
----------------------------------------------------------------------
True label
0 971 0 1 0 0 1 3 1 3 0 980
1 0 1122 1 3 1 0 1 0 7 0 1135
2 10 2 989 2 5 0 5 9 10 0 1032
3 1 3 12 949 0 21 0 11 12 1 1010
4 1 2 1 0 961 0 6 1 2 8 982
5 4 3 0 2 0 872 3 1 5 2 892
6 10 4 2 1 4 3 930 0 4 0 958
7 1 11 22 1 3 0 0 978 2 10 1028
8 7 4 2 3 7 4 1 4 938 4 974
9 9 8 0 3 27 9 1 7 10 935 1009
----------------------------------------------------------------------
Total number of input images tested = 10000
----------------------------------------------------------------------
Testing: Correct predictions = 9645 (96.45%)
When training over multiple epochs, this classic convolutional neural network design can achieve accuracy of over 99%! While this is one possible network using the library, a developer could build any other design incorporating fully connected, pooling, and convolutional layers. Below you’ll find a code snippet showing how LeNet5 is implemented using FastNeurons.
void NetworkCreateLeNet5(Network* network)
{
const int kernelWidth = 5;
NetworkCreate(network, 8);
// Add the layers
NetworkAddLayerInput2D(network, 1, 28, 28, 4);
NetworkAddLayerConvolution2D(network, 6, kernelWidth, ActivationTypeReLU);
NetworkAddLayerPooling2D(network, 2, PoolingTypeMax);
NetworkAddLayerConvolution2D(network, 16, kernelWidth, ActivationTypeReLU);
NetworkAddLayerPooling2D(network, 2, PoolingTypeMax);
NetworkAddLayerFullyConnected(network, 120, ActivationTypeReLU);
NetworkAddLayerFullyConnected(network, 84, ActivationTypeReLU);
NetworkAddLayerFullyConnected(network, 10, ActivationTypeReLU);
}