Authors: Yi Yang,Andy Chen,Xiaoming Chen,Jiang Ji,Zhenyang Chen,Yan Dai
ArXiv: 1805.09473
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1805.09473v1
Implementing large-scale deep neural networks with high computational
complexity on low-cost IoT devices may inevitably be constrained by limited
computation resource, making the devices hard to respond in real-time. This
disjunction makes the state-of-art deep learning algorithms, i.e. CNN
(Convolutional Neural Networks), incompatible with IoT world. We present a
low-bit (range from 8-bit to 1-bit) scheme with our local quantization region
algorithm. We use models in Caffe model zoo as our example tasks to evaluate
the effect of our low precision data representation scheme. With the available
of local quantization region, we find implementations on top of those schemes
could greatly retain the model accuracy, besides the reduction of computational
complexity. For example, our 8-bit scheme has no drops on top-1 and top-5
accuracy with 2x speedup on Intel Edison IoT platform. Implementations based on
our 4-bit, 2-bit or 1-bit scheme are also applicable to IoT devices with
advances of low computational complexity. For example, the drop on our task is
only 0.7% when using 2-bit scheme, a scheme which could largely save
transistors. Making low-bit scheme usable here opens a new door for further
optimization on commodity IoT controller, i.e. extra speed-up could be achieved
by replacing multiply-accumulate operations with the proposed table look-up
operations. The whole study offers a new approach to relief the challenge of
bring advanced deep learning algorithm to resource constrained low-cost IoT
device.