Search This Blog

Monday, December 9, 2019

YOLO on Raspberry Pi






Tuesday, August 28, 2018


Deep Learning with Raspberry Pi -- Real-time object detection with YOLO v3 Tiny! [updated on Dec 19 2018, detailed instruction included]

A quick note on Dec 18 2018:
Since I posted this article late Aug, I have been inquired many times on the detailed instruction and also the python wrapper. Having been really busy in the last several months, I finally found some spare time completing this blog with detailed instruction! All the information can be found in my GitHub repos which was forked from shizukachan/darknet-nnpack. I have modified the Makefile, added the two Python nonblocking wrapper, and made some other minor modification. It should "almost" work out of the box!
https://github.com/zxzhaixiang/darknet-nnpack/blob/yolov3/rpi_video.py

Here goes the updated article

I am a big fan of Yolo (You Only Look Once, Yolo website). Redmon & Farhadi's famous Yolo series work had big impacts on the deep learning society. BTW, their recent "paper" (Yolo v3: an incremental Improvement) is an interesting read as well.

So, what is Yolo? Yolo is a cutting-edge object detection algorithm, i.e., it detects objects from images. Traditionally people used moving windows to scan an image, and then try to recognize each snapshot in every possible window locations. This method is of course very time consuming because there are many different ways to place the window, and many computations need to be done repeatedly. Yolo, standing for "You Only Look Once" (not You Only Live Once), smartly avoids those heavy computations by directly predicting object category and their bounding boxes simultaneously.

YoloV3 is one of the latest updates of Yolo algorithm. The biggest change is that YoloV3 now uses only convolutional layers and no more fully-connected layer. Don't let the technical term scare you away! What does this implies is that YoloV3 does not care about the input image size anymore! As long as the height and width are integer times 32 (such as 224x224, 288x288, 608x288, etc), YoloV3 will work fine! Another major improvement of YoloV3 is that it gives predictions in the intermediate layers as well. Again, what does it mean, is that Yolo3 now does a better job predicting small objects than its previous version!

I will have to skip the technical detail here because the paper explained everything. The only thing you need to know is that Yolo is lightweight and fast and decently accurate. It is so lightweight and fast that it can even be used on Raspberry Pi, a single-board computer with smart-phone-grade CPU and limited RAM and no CUDA GPU, to run object detection in real-time! And, it is also convenient because the authors had provided configuration files and weights trained on COCO dataset. So no need to train your own model if you are only interested to detect common objects.


Although Yolo is super efficient, it still requires quite a lot of computation. The original YoloV3, which was written with a C++ library called Darknet by the same authors, will report "segmentation fault" on Raspberry Pi v3 model B+ because Raspberry Pi simply cannot provide enough memory to load the weight. YoloV3-tiny version, however, can be run on RPI 3, very slowly.

Again, I wasn't able to run YoloV3 full version on Pi 3. I think it wouldn't be possible to do so considering the large memory requirement by YoloV3. This article is all about implementing YoloV3-Tiny on Raspberry Pi Model 3B!

Quite a few steps still have to be done to speed up yolov3-tiny on the pi:
1. Install NNPACK, an acceleration library for the neural network to run on multi-core CPU
2. Add some special configuration to the Makefile to compile the Darknet Yolo source code on Cortex CPU and NNPACK optimization
3. Either install opencv C++ (big pain on raspberry pi) or write some python code to wrap darknet. I believe Yolo comes with a python wrapper but I haven't had a chance to test it on RPI.
4. Download Yolov3-tiny.cfg and Yolov3-tiny.weights. Run Darknet with Yolo tiny version (not full version)!

Sounds complicated? Luckily digitalbrain79 (not me) had already figured it out (https://github.com/digitalbrain79/darknet-nnpack). I had more luck with Shizukachan's fork version. I even made a few more changes to make it easier to follow:

Step 0: prepare Python and Pi Camera

Log in to Raspberry Pi using SSH or directly in terminal.
Make sure pip-install is included (it should come together with Debian
sudo apt-get install python-pip
Install OpenCV. The simplest way on RPI is as follows (do not build from source!):
sudo apt-get install python-opencv
Enable pi camera
sudo raspi-config
Go to Interfacing Options, and enable P1/Camera
You will have to reboot the pi to be able to use the camera
A few additional words here. In the advanced option of raspi-config, you can adjust the memory split between CPU and GPU. Although we would like to allocate more ram to CPU so that the pi can load a larger model, you will want to allocate at least 64MB to GPU as the camera module would require it.

Step 1: Install NNPACK

NNPACK was used to optimize Darknet without using a GPU. It is useful for embedded devices using ARM CPUs.
Idein's qmkl is also used to accelerate the SGEMM using the GPU. This is slower than NNPACK on NEON-capable devices and primarily useful for ARM CPUs without NEON.
The NNPACK implementation in Darknet was improved to use transform-based convolution computation, allowing for 40%+ faster inference performance on non-initial frames. This is most useful for repeated inferences, ie. video, or if Darknet is left open to continue processing input instead of allowed to terminate after processing input.

Install Ninja (building tool)

Install PeachPy and confu
sudo pip install --upgrade git+https://github.com/Maratyszcza/PeachPy
sudo pip install --upgrade git+https://github.com/Maratyszcza/confu
Install Ninja
git clone https://github.com/ninja-build/ninja.git
cd ninja
git checkout release
./configure.py --bootstrap
export NINJA_PATH=$PWD
cd
Install clang (I'm not sure why we need this, NNPACK doesn't use it unless you specifically target it).
sudo apt-get install clang

Install NNPACK

Install modified NNPACK
git clone https://github.com/shizukachan/NNPACK
cd NNPACK
confu setup
python ./configure.py --backend auto
If you are compiling for the Pi Zero, change the last line to python ./configure.py --backend scalar
You can skip the following several lines from the original darknet-nnpack repos. I found them not very necessary (or maybe I missed something)
It's also recommended to examine and edit https://github.com/digitalbrain79/NNPACK-darknet/blob/master/src/init.c#L215 to match your CPU architecture if you're on ARM, as the cache size detection code only works on x86.
Since none of the ARM CPUs have a L3, it's recommended to set L3 = L2 and set inclusive=false. This should lead to the L2 size being set equal to the L3 size.
Ironically, after some trial and error, I've found that setting L3 to an arbitrary 2MB seems to work pretty well.
Build NNPACK with ninja (this might take * quie * a while, be patient. In fact my Pi crashed in the first time. Just reboot and run again):
$NINJA_PATH/ninja
do a ls and you should be able to find the folders lib and include if all went well:
ls
Test if NNPACK is working:
bin/convolution-inference-smoketest
In my case, the test actually failed in the first time. But I just ran the test again and all items are passed. So if your test failed, don't panic, try one more time.
Copy the libraries and header files to the system environment:
sudo cp -a lib/* /usr/lib/
sudo cp include/nnpack.h /usr/include/
sudo cp deps/pthreadpool/include/pthreadpool.h /usr/include/
If the convolution-inference-smoketest fails, you've probably hit a compiler bug and will have to change to Clang or an older version of GCC.
You can skip the qmkl/qasm/qbin2hex steps if you aren't targeting the QPU.
Install qmkl
sudo apt-get install cmake
git clone https://github.com/Idein/qmkl.git
cd qmkl
cmake .
make
sudo make install
Install qasm2
sudo apt-get install flex
git clone https://github.com/Terminus-IMRC/qpu-assembler2
cd qpu-assembler2
makesudo make install
Install qbin2hex
git clone https://github.com/Terminus-IMRC/qpu-bin-to-hex
cd qpu-bin-to-hex
make
sudo make install

Step 2. Install darknet-nnpack

We have finally finished configuring everything needed. Now simply clone this repository. Note that we are cloning the yolov3branch. It comes with the python wrapper I wrote, correct makefile, and yolov3 weight:
cd
git clone -b yolov3 https://github.com/zxzhaixiang/darknet-nnpack
cd darknet-nnpack
git checkout yolov3
make
At this point, you can build darknet-nnpack using make. Be sure to edit the Makefile before compiling.

Step 3. Test with YoloV3-tiny

Despite doing so many pre-configurations, Raspberry Pi is not powerful enough to run the full YoloV3 version. The YoloV3-tiny version, however, can be run at about 1 frame per second rate
I wrote two python nonblocking wrappers to run Yolo, rpi_video.py and rpi_record.py. What these two python codes do is to take pictures with PiCamera python library, and spawn darknet executable to conduct detection tasks to the picture, and then save to prediction.png, and the python code will load prediction.png and display it on the screen via opencv. Therefore, all the detection jobs are done by darknet, and python simply provides in and out. rpi_video.py will only display the real-time object detection result on the screen as an animation (about 1 frame every 1-1.5 second); rpi_record.py will also save each frame for your own record (like making a git animation afterwards)
To test it, simply run
sudo python rpi_video.py
or
sudo python rpi_record.py
You can adjust the task type (detection/classification?), weight, configure file, and threshold in line
yolo_proc = Popen(["./darknet",
                   "detect",
                   "./cfg/yolov3-tiny.cfg",
                   "./yolov3-tiny.weights",
                   "-thresh", "0.1"],
                   stdin = PIPE, stdout = PIPE)
For more details/weights/configuration/different ways to call darknet, refer to the official YOLO homepage.
As I mentioned, YoloV3-tiny does not care about the size of the input image. So feel free to adjust the camera resolution as long as both height and width are integer multiplication of 32.

#camera.resolution = (224, 224)
#camera.resolution = (608, 608)
camera.resolution = (544, 416)

Here are my test results:

1. It worked. Yolov3-tiny on Raspberry Pi 3 Model B+ has a frame rate of 1 frame per sec (FPS). The rpi_video.py will print the time it requires Yolov3-tiny to predict on an image. I was able to get numbers like 0.9 second to 1.1 second per frame. Not bad at all! Of course, you can't do any rigorous fast object tracing. But for a surveillance camera, or slow robot, or even drone, 1FPS is promising. NNPACK is critical here. As pointed out by Shizukachan, without NNPACK the frame rate will be lower than 0.1FPS!

2.Make sure the power supply you are using can truly provide 2.4A (which is desired by RPI 3B). I have seen cases that the detection speed drops to 1 frame per 1.7 seconds because the power supply did not provide sufficient power.

3. It worked limitedly. Yolov3-tiny is not that accurate compared to Yolov3 full version. But if you want to detect specific objects in some specific scene, you can probably train your own Yolo v3 model (must be the tiny version) on GPU desktop, and transplant it to RPI. Never try to train the model on RPI. Don't even think about it.. With pre-trained Yolov3-tiny on COCO dataset, some good transfer learning can be leveraged to speed up the training speed.

4. I didn't modify the source code of Yolo. When performing a detection task, Yolo outputs an image with bounding box, label and confidence overlaied on top. If you would like to get such information in a digital form, you will have to dig into Yolo's source code and modify the output part. It should be relatively straightforward.

Finally, the results. Note that I accelerated the video 5 times. The actual frame rate is about 1 frame per second.


Yolov3-tiny successfully detected keyboard, banana, person (me), cup, sometimes sofa, car, etc. It thought curious George as teddy bear all the time, probably because COCO dataset does not have a category called "Curious George stuffed animal". It got confused on the old-fashion calculator and sometimes recognized it as a laptop or a cell phone. But in general, I was very surprised to see the results, and the frame rate! 




No comments: