Yolo：基於深度學習的物件偵測 (含YoloV3)

有趣又有點技術含量的物件偵測 Yolo，enjoy it : )

一、Yolo: Real-Time Object Detection 簡介

Yolo 系列 (You only look once, Yolo) 是關於物件偵測 (object detection) 的類神經網路演算法，以小眾架構 darknet 實作，實作該架構的作者 Joseph Redmon 沒有用到任何著名深度學習框架，輕量、依賴少、演算法高效率，在工業應用領域很有價值，例如行人偵測、工業影像偵測等等。

官網寫的非常詳盡，照著操作便能完成 Yolo 初步的 detection 和 training。

YOLO: Real-Time Object Detection
https://pjreddie.com/darknet/yolo/

Yolo 提供的 detector 命令列工具使用方法如下：

$ ./darknet detector [action] [file_config] [network_config] [weights]

一些常見的用法：

$ ./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg
$ ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights
$ ./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights <video file>
$ ./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74
$ ./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74 -gpus 0,1,2,3
$ ./darknet detector train cfg/coco.data cfg/yolov3.cfg backup/yolov3.backup -gpus 0,1,2,3

記得 training 自己資料時要更改 .cfg 檔 (改法 paper 中有提到)：

...
filters =  315  // 3 x (classes + 5) = 315

[yolo]

...
classes = 100  // classes in file_config
...

另外還有 valid 和 recall 等用法詳見 detector.c。

二、Yolo 演算法簡介

Yolo 目前已經出到第 3 代，但前 2 代的思路仍然十分值得參考，作者實作細節大方不藏私、跑分數值含水量少，非常值得讚賞，程式值得細細推敲琢磨。

（以下介紹比較粗略，詳見 v1、v2 和 v3 的論文，很值得一讀。）

1. YoloV1

從 RCNN、fast RCNN、faster RCNN、Yolo 的思路一路發展上來，Yolo 最大的特色是直接 end-to-end 做物件偵測，利用整張圖片作為神經網路的輸入，直接預測 bounding box 坐標位置、bounding box 含物體的 confidence 和物體所屬的類別。

YoloV1 計算快速，能夠達到 real-time 速度需求，缺點是對位置的預測不夠精確，且小物體預測效果較差。

Fig. models detection as a regression problem.

Fig. multi-part loss function.

2. YoloV2

YoloV2 針對 YoloV1 的缺點做了一些改進：

引入 Faster RCNN 中的 anchor box，不再直接 mapping bounding box 的座標，而是預測相對於 anchor box 的參數，並使用 K-Means 求 anchor box 比例。
去掉 fc layer，改成全部皆為 conv layer。
每層加上 batch normalization，去掉 dropout。
增加解析度：增加 ImageNet pretrain 的解析度，從 224×224 提升至 448×448。

Fig. anchor box 示意圖

3. YoloV3

YoloV3 並沒有做革命性的創新，而是參考其他的論文對本身的模型做優化，效果十分顯著。

1. 使用 resnet 網路 (Residual Network)

新的基底網路為 Darknet-53，有 53 層，隨著網絡層數不斷加深 (數量級從 20~30 層到 ~50 層)，採用了一般類神經網路加深時常用的 ResNet 結構來解決梯度問題。

2. 使用 FPN 網路 (Feature Pyramid Networks)

使用 FPN 多層級預測架構以提升小物體預測能力，特徵層從單層 13x13 變成了多層 13x13、26x26 和 52x52，單層預測 5 種 bounding box 變成每層 3 種 bounding box (共 9 種)，詳見網路結構圖。使用 FPN 的架構可以讓低層較佳的目標位置和高層較佳的語義特徵融合，並且在不同特徵層獨立進行預測，使得小物體檢測改善效果十分明顯。

Fig. YoloV3 架構圖

另外，種類預測從 softmax 改用 logistic classifier，每個 bounding 與 ground truth 的 matching 策略變成了 1 對 1，以符合分類種類並非互斥的實際情況。

三、Yolo 的 training log 判讀

控制 detector train 指令輸出的程式碼在 examples/detector.c 中，分成 subdivision 完成時的 log 和 batch 完成時的 log 兩個部分。

1. Batch 完成時的 log

[current iteration]: [cur loss], [avg loss] avg, [learning rate] rate, [total running time] seconds, [total images count] images

current iteration: 目前是第幾次 iteration，每一次 batch 加 1。
cur loss: 目前的 loss 數值。
avg loss: 平均的 loss 數值。
learning rate: 學習率，會照著 cfg 檔中的設定變化，通常會衰減。
total running time: 這個 batch 花費的時間。
total images count: 目前讀取到的 images 數目。

2. Subdivision 完成時的 log

Region 82 Avg IOU: [Avg IOU], Class: [class confid.], Obj: [object confid.], No Obj: [no object confid.], .5R: [0.5 Recall]

其中，IOU 和 R(recall) 的數值較為重要，是可以觀察訓練情形的指標，數值越高當然訓練越好。

四、模型評估

Yolo 模型評估主要採取 object detection 常見的指標：IoU 和 mAP。

1. IoU (Intersection over Union)

預測結果與 ground truth 的聯集分之交集 (如下式)。一般預測任務最常用的指標是 .5 IoU，表示在一次 bounding box 預測中，該 bounding box 算出的 IoU > 0.5 時為預測成功。

$IoU(A, B) = \frac{A \cap B}{A \cup B}$

IoU 示意圖 (ref: CS1674)

不同 IoU 數值的比較 (ref: paper)

2. mAP (Mean Average Precision)

計算各個種類的精確度並平均，這裡的精確度即是使用 IoU 作為判別準則，通常為 .5 IoU。

TP(c): True Positive in class c，預測的 proposal 和 ground true 吻合 (種類正確且重疊部份夠高)。
FP(c): False Positive in class c，預測的 proposal 和 ground true 不吻合 (種類錯誤或重疊部份不夠高)。

因此由以上步驟可知在 class c 中的準確率為：

$\frac{TP(c)}{TP(c) + FP(c)}$

因此每個 class 平均的準確率為：

$mAP = \frac{1}{|classes|}\sum_{c \in classes} \frac{TP(c)}{TP(c) + FP(c)}$

3. mAP@[.5:.95]

IoU 取一個 threshold (如 .5) 可以計算 mAP，因此可以取 0.5 到 0.95 的每 0.05 一個間隔設定 threshold 都算一次 mAP 後將所有數值平均，即為 mAP@[.5:.95]。但 Yolo 的作者覺得太高的 IoU 人類也分不出來，在應用場景上可能不那麼實用。

五、調整 Yolo 模型的方法

大體而言，調整 Yolo 模型的方法與一般的類神經網路模型相似，模型的成敗資料的分佈還是佔很大的比例，當然主流的 tune learning rate、fine-tune 和 data augmentation 等等一定要測過，其他的方法有時間也可以試試看。

1. 整理資料集：資料增強 (data augmentation)

可以在設定文件中 (.cfg) 做資料增強，例如：旋轉 (angle)、曝光 (exposure)、飽和度 (saturation)、色調 (hue)。

2. 整理資料集：負面的訓練樣本

要有負面的訓練樣本，也就是沒有 bounding box 的影像。

3. 提高解析度

訓練時將設定文件 (.cfg) 中的解析度設為符合該網路的最高解析度 (height=608, width=608)，或任何 >32 的 2 次方倍。另外，training 用較低解析度， detect 時再將解析度調高也能提升部分準確度。

4. 使用聚類方法 (clustering) 重新計算 anchors box 大小

根據原論文，用 k-means 方法從訓練樣本中選出適合的 anchor boxes，並將結果填入設定檔。

5. 微調模型 (fine-tune)

可以使用 stopbackward=1 禁止訓練某些層來做 fine-tune。

6. 記憶體不足解法：增加 subdivisions

計算時一個 batch 更新 1 次網路權重，而 subdivisions 會將 batch 再分割成 [batch/subdivisions] 個部分做計算，因此增加 subdivisions 可以減輕記憶體的負擔。

in .cfg file:

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8
...

其他：Yolo 作者 Joseph Redmon

Yolo 作者 Joseph Redmon 是一個滿有趣的人，不僅 YoloV3 的 tech report 筆鋒非常瀟灑，Yolo github 上的 commit 也是非常可愛，簡歷上滿滿的彩虹小馬，非常有自己的 style，下圖讓大家感受一下大師萌萌的風采 : )

References

J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In CVPR, 2016.
https://arxiv.org/abs/1506.02640

J. Redmon and A. Farhadi. YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.
https://arxiv.org/abs/1612.08242

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144, 2016.
https://arxiv.org/abs/1612.03144

Joseph Chet Redmon - YOLO: Real-Time Object Detection
https://pjreddie.com/darknet/yolo/
https://pjreddie.com/darknet/

AlexeyAB/darknet - Yolo-v3 and Yolo-v2 for Windows and Linux
https://github.com/AlexeyAB/darknet

Cartucho - OpenLabeling
https://github.com/Cartucho/OpenLabeling

Timebutt - Understanding YOLOv2 training output
https://timebutt.github.io/static/understanding-yolov2-training-output/

Timebutt - How to train YOLOv2 to detect custom objects
https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/

nusit_305 - YOLOv3: 訓練自己的數據
https://blog.csdn.net/lilai619/article/details/79695109

螞蟻 flow - YOLO配置文件理解
http://www.cnblogs.com/antflow/p/7275486.html

xmfbit - Yolo 論文閱讀
https://xmfbit.github.io/2017/02/04/yolo-paper/

xmfbit - 論文 Yolo v3
https://xmfbit.github.io/2018/04/01/paper-yolov3/

Pages

2018年6月26日星期二