AIoT: Yolo V5 On Kera
Yolo V5
TensorFlow Keras Implemnation from sratch
Training On Dataset
In tuto 03 on this blog I used Object Detection by learning transfert, convert it in tflite mode for rasberry pi usage, in this new one I will implement a YOLO model from scratch and use it on a custom dataset. Yolo (You Look Once) is one of fastets Object detectors, its V5 version was implemented in Pytorch, previous versions were writting C. ¶
Following description of architecture evolution of the model, and an implementation in Kera under tensorflow, next step will be adaptation for MCU devices ¶
Let's import What we need firest ¶
In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from skimage.transform import resize
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D
from keras.models import load_model, Model
from keras.layers.merge import add, concatenate
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from matplotlib import pyplot
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from matplotlib.patches import Rectangle
%matplotlib inline
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import xml.etree.ElementTree as ET
from tqdm import tqdm
import random
import shutil
from PIL import Image, ImageDraw
from sklearn.model_selection import train_test_split
### fromp __init__
from kyolov5.yolo import Yolo
from kyolov5.optimizer import Optimizer, LrScheduler
from kyolov5.read_data import DataReader, transforms
from kyolov5.load_data import DataLoader
from kyolov5.trainer import *
from kyolov5.image_utils import resize_image
from kyolov5.loss import YoloLoss
from IPython.display import clear_output, display
#from kyolov5.config import params
np.random.seed(1919)
tf.random.set_seed(1949)
tf.__version__
Out[1]:
'2.8.0'
Download the Data ¶
For this tutorial, we are going to use an object detection dataset of road signs from MakeML. ¶
It is a dataset that contains road signs belonging to 4 classes:¶
Traffic Light¶
Stop¶
Speed Limit¶
Crosswalk¶
The dataset is a small one, containing only 877 images in total. While you may want to train with a larger dataset (like the LISA Dataset) to fully realize the capabilities of YOLO, we use a small dataset in this tutorial to facilitate quick prototyping. Typical training takes less than half an hour and this would allow you to quickly iterate with experiments involving different hyperparamters.¶
We create a directory called Road_Sign_Dataset to keep our dataset now. This directory needs to be in the same folder as the yolov5 repository folder we just cloned.¶
In [2]:
DATASET_DIR='data/Road_Sign_Dataset'
!mkdir data/Road_Sign_Dataset
mkdir: cannot create directory ‘data/Road_Sign_Dataset’: File exists
In [3]:
!cd data/Road_Sign_Dataset;wget -O RoadSignDetectionDataset.zip https://arcraftimages.s3-accelerate.amazonaws.com/Datasets/RoadSigns/RoadSignsPascalVOC.zip?region=us-east-2
--2022-06-16 07:15:07-- https://arcraftimages.s3-accelerate.amazonaws.com/Datasets/RoadSigns/RoadSignsPascalVOC.zip?region=us-east-2 Resolving feynmanmaster01.cluster.local (feynmanmaster01.cluster.local)... 10.2.23.250 Connecting to feynmanmaster01.cluster.local (feynmanmaster01.cluster.local)|10.2.23.250|:3128... connected. Proxy request sent, awaiting response... 200 OK Length: 229344361 (219M) [application/zip] Saving to: ‘RoadSignDetectionDataset.zip’ RoadSignDetectionDa 100%[===================>] 218.72M 53.2MB/s in 4.7s 2022-06-16 07:15:12 (46.6 MB/s) - ‘RoadSignDetectionDataset.zip’ saved [229344361/229344361]
In [7]:
#!cd data/Road_Sign_Dataset;unzip RoadSignDetectionDataset.zip
In [8]:
!cd data/Road_Sign_Dataset;rm -r __MACOSX RoadSignDetectionDataset.zip
Convert the Annotations into the YOLO v5 Format ¶
In this part, we convert annotations into the format expected by YOLO v5. There are a variety of formats when it comes to annotations for object detection datasets. ¶
Annotations for the dataset we downloaded follow the PASCAL VOC XML format, which is a very popular format. Since this a popular format, you can find online conversion tools. Nevertheless, we are going to write the code for it to give you some idea of how to convert lesser popular formats as well (for which you may not find popular tools). ¶
The PASCAL VOC format stores its annotation in XML files where various attributes are described by tags. Let us look at one such annotation file. ¶
In [9]:
# Assuming you're in the data folder
!cd data/Road_Sign_Dataset;cat annotations/road4.xml
<annotation> <folder>images</folder> <filename>road4.png</filename> <size> <width>267</width> <height>400</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>trafficlight</name> <pose>Unspecified</pose> <truncated>0</truncated> <occluded>0</occluded> <difficult>0</difficult> <bndbox> <xmin>20</xmin> <ymin>109</ymin> <xmax>81</xmax> <ymax>237</ymax> </bndbox> </object> <object> <name>trafficlight</name> <pose>Unspecified</pose> <truncated>0</truncated> <occluded>0</occluded> <difficult>0</difficult> <bndbox> <xmin>116</xmin> <ymin>162</ymin> <xmax>163</xmax> <ymax>272</ymax> </bndbox> </object> <object> <name>trafficlight</name> <pose>Unspecified</pose> <truncated>0</truncated> <occluded>0</occluded> <difficult>0</difficult> <bndbox> <xmin>189</xmin> <ymin>189</ymin> <xmax>233</xmax> <ymax>295</ymax> </bndbox> </object> </annotation>
The above annotation file describes a file named road4.jpg that has a dimensions of 267 x 400 x 3. It has 3 object tags which represent 3 bounding boxes. The class is specified by the name tag, whereas the details of the bounding box are represented by the bndbox tag. A bounding box is described by the coordinates of its top-left (x_min, y_min) corner and its bottom-right (xmax, ymax) corner. ¶
YOLO v5 Annotation Format ¶
YOLO v5 expects annotations for each image in form of a .txt file where each line of the text file describes a bounding box. Consider the following image.¶
The annotation file for the image above looks like the following: ¶
There are 3 objects in total (2 persons and one tie). Each line represents one of these objects. The specification for each line is as follows. ¶
One row per object ¶
Each row is class x_center y_center width height format. ¶
Box coordinates must be normalized by the dimensions of the image (i.e. have values between 0 and 1) ¶
Class numbers are zero-indexed (start from 0). ¶
We now write a function that will take the annotations in VOC format and convert them to a format where information about the bounding boxes are stored in a dictionary. ¶
In [22]:
# Function to get the data from XML Annotation
def extract_info_from_xml( xml_file):
root = ET.parse(xml_file).getroot()
# Initialise the info dict
info_dict = {}
info_dict['bboxes'] = []
# Parse the XML Tree
for elem in root:
# Get the file name
if elem.tag == "filename":
info_dict['filename'] = elem.text
# Get the image size
elif elem.tag == "size":
image_size = []
for subelem in elem:
image_size.append(int(subelem.text))
info_dict['image_size'] = tuple(image_size)
# Get details of the bounding box
elif elem.tag == "object":
bbox = {}
for subelem in elem:
if subelem.tag == "name":
bbox["class"] = subelem.text
elif subelem.tag == "bndbox":
for subsubelem in subelem:
bbox[subsubelem.tag] = int(subsubelem.text)
info_dict['bboxes'].append(bbox)
return info_dict
Let's We now write a function to convert information contained in info_dict to YOLO v5 style annotations and write them to a txt file. In case your annotations are different than PASCAL VOC ones, you can write a function to convert them to the info_dict format and use the function below to convert them to YOLO v5 style annotations.¶
In [7]:
# Dictionary that maps class names to IDs
class_name_to_id_mapping = {"trafficlight": 0,
"stop": 1,
"speedlimit": 2,
"crosswalk": 3}
# Convert the info dict to the required yolo format and write it to disk
def convert_to_yolov5(info_dict):
print_buffer = []
# For each bounding box
for b in info_dict["bboxes"]:
try:
class_id = class_name_to_id_mapping[b["class"]]
except KeyError:
print("Invalid Class. Must be one from ", class_name_to_id_mapping.keys())
# Transform the bbox co-ordinates as per the format required by YOLO v5
b_center_x = (b["xmin"] + b["xmax"]) / 2
b_center_y = (b["ymin"] + b["ymax"]) / 2
b_width = (b["xmax"] - b["xmin"])
b_height = (b["ymax"] - b["ymin"])
# Normalise the co-ordinates by the dimensions of the image
image_w, image_h, image_c = info_dict["image_size"]
b_center_x /= image_w
b_center_y /= image_h
b_width /= image_w
b_height /= image_h
#Write the bbox details to the file
print_buffer.append("{} {:.3f} {:.3f} {:.3f} {:.3f}".format(class_id, b_center_x, b_center_y, b_width, b_height))
# Name of the file which we have to save
save_file_name = os.path.join("data/Road_Sign_Dataset/annotations", info_dict["filename"].replace("png", "txt"))
# Save the annotation to disk
print("\n".join(print_buffer), file= open(save_file_name, "w"))
Now we convert all the xml annotations into YOLO style txt ones ¶
In [24]:
# Get the annotations
annotations = [os.path.join('data/Road_Sign_Dataset/annotations', x) for x in os.listdir('data/Road_Sign_Dataset/annotations') if x[-3:] == "xml"]
annotations.sort()
#print(annotations)
# Convert and save the annotations
for ann in tqdm(annotations):
#print(info_dict)
info_dict = extract_info_from_xml( ann)
convert_to_yolov5(info_dict)
annotations = [os.path.join('data/Road_Sign_Dataset/annotations', x) for x in os.listdir('data/Road_Sign_Dataset/annotations') if x[-3:] == "txt"]
Out[24]:
'# Get the annotations\nannotations = [os.path.join(\'data/Road_Sign_Dataset/annotations\', x) for x in os.listdir(\'data/Road_Sign_Dataset/annotations\') if x[-3:] == "xml"]\nannotations.sort()\n#print(annotations)\n# Convert and save the annotations\nfor ann in tqdm(annotations):\n #print(info_dict)\n info_dict = extract_info_from_xml( ann)\n convert_to_yolov5(info_dict)\nannotations = [os.path.join(\'data/Road_Sign_Dataset/annotations\', x) for x in os.listdir(\'data/Road_Sign_Dataset/annotations\') if x[-3:] == "txt"]\n'
Testing the annotations ¶
Just for a sanity check, let us now test some of these transformed annotations. We randomly load one of the annotations and plot boxes using the transformed annotations, and visually inspect it to see whether our code has worked as intended.¶
Run the next cell multiple times. Every time, a random annotation is sampled.¶
In [33]:
random.seed(0)
class_id_to_name_mapping = dict(zip(class_name_to_id_mapping.values(), class_name_to_id_mapping.keys()))
def plot_bounding_box(image, annotation_list):
annotations = np.array(annotation_list)
w, h = image.size
plotted_image = ImageDraw.Draw(image)
transformed_annotations = np.copy(annotations)
transformed_annotations[:,[1,3]] = annotations[:,[1,3]] * w
transformed_annotations[:,[2,4]] = annotations[:,[2,4]] * h
transformed_annotations[:,1] = transformed_annotations[:,1] - (transformed_annotations[:,3] / 2)
transformed_annotations[:,2] = transformed_annotations[:,2] - (transformed_annotations[:,4] / 2)
transformed_annotations[:,3] = transformed_annotations[:,1] + transformed_annotations[:,3]
transformed_annotations[:,4] = transformed_annotations[:,2] + transformed_annotations[:,4]
for ann in transformed_annotations:
obj_cls, x0, y0, x1, y1 = ann
plotted_image.rectangle(((x0,y0), (x1,y1)))
plotted_image.text((x0, y0 - 10), class_id_to_name_mapping[(int(obj_cls))])
plt.imshow(np.array(image))
plt.show()
# Get any random annotation file
annotation_file = random.choice(annotations)
with open(annotation_file, "r") as file:
annotation_list = file.read().split("\n")[:-1]
annotation_list = [x.split(" ") for x in annotation_list]
annotation_list = [[float(y) for y in x ] for x in annotation_list]
#Get the corresponding image file
image_file = annotation_file.replace("annotations", "images").replace("txt", "png")
assert os.path.exists(image_file)
#Load the image
image = Image.open(image_file)
#Plot the Bounding Box
plot_bounding_box(image, annotation_list)
Great, we are able to recover the correct annotation from the YOLO v5 format. This means we have implemented the conversion function properly. ¶
Partition the Dataset ¶
Next we partition the dataset into train, validation, and test sets containing 80%, 10%, and 10% of the data, respectively. You can change the split values according to your convenience. ¶
In [25]:
# Read images and annotations
images = [os.path.join('data/Road_Sign_Dataset/images', x) for x in os.listdir('data/Road_Sign_Dataset/images')]
annotations = [os.path.join('data/Road_Sign_Dataset/labels', x) for x in os.listdir('data/Road_Sign_Dataset/labels') if x[-3:] == "txt"]
images.sort()
annotations.sort()
# Split the dataset into train-valid-test splits
train_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size = 0.2, random_state = 1)
val_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size = 0.5, random_state = 1)
Out[25]:
'# Read images and annotations\nimages = [os.path.join(\'data/Road_Sign_Dataset/images\', x) for x in os.listdir(\'data/Road_Sign_Dataset/images\')]\nannotations = [os.path.join(\'data/Road_Sign_Dataset/labels\', x) for x in os.listdir(\'data/Road_Sign_Dataset/labels\') if x[-3:] == "txt"]\n\nimages.sort()\nannotations.sort()\n\n# Split the dataset into train-valid-test splits \ntrain_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size = 0.2, random_state = 1)\nval_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size = 0.5, random_state = 1)\n'
Create the folders to keep the splits. ¶
In [38]:
!mkdir data/Road_Sign_Dataset/images/train data/Road_Sign_Dataset/images/val data/Road_Sign_Dataset/images/test
!mkdir data/Road_Sign_Dataset/annotations/train data/Road_Sign_Dataset/annotations/val data/Road_Sign_Dataset/annotations/test
Move the files to their respective folders. ¶
In [41]:
#Utility function to move images
def move_files_to_folder(list_of_files, destination_folder):
for f in list_of_files:
try:
shutil.move(f, destination_folder)
except:
print(f)
assert False
# Move the splits into their folders
move_files_to_folder(train_images, 'data/Road_Sign_Dataset/images/train')
move_files_to_folder(val_images, 'data/Road_Sign_Dataset/images/val/')
move_files_to_folder(test_images, 'data/Road_Sign_Dataset/images/test/')
move_files_to_folder(train_annotations, 'data/Road_Sign_Dataset/annotations/train/')
move_files_to_folder(val_annotations, 'data/Road_Sign_Dataset/annotations/val/')
move_files_to_folder(test_annotations, 'data/Road_Sign_Dataset/annotations/test/')
Rename the annotations folder to labels, as this is where YOLO v5 expects the annotations to be located in. ¶
In [42]:
!mv data/Road_Sign_Dataset/annotations data/Road_Sign_Dataset/labels
From Yolo1 To Yolo5 </sapn>
Network Architecture Evolution ¶
Anchor Boxes ¶
Bigger Network with ResNet ¶
Multi Scale Detector ¶
YOLO v3 is using a following network to perform feature extraction which is undeniably larger compare to YOLO v2. This network is known as Darknet-53 as the whole network composes of 53 convolutional layers with shortcut connections (Redmon & Farhadi, 2018).
YOLO v3 network has 53 convolutional layers (Redmon & Farhadi, 2018) Therefore, the code below composes several components which are:
Yolo 3 Detection Layer
YOLO V4 ¶
The original YOLO algorithm was written by Joseph Redmon, who is also the author of a customframework called Darknet. After 5 years of research and development to the 3rd generation of YOLO (YOLOv3), Joseph Redmon announced his withdrawal from the field of computer vision and discontinued developing the YOLO algorithm for concern of his research being abused in the military applications. ¶
However, he does not dispute the continuation of research by any individual or organization based on the early ideas of the YOLO algorithm. In April 2020, Alexey Bochkovsky, a Russian researcher and engineer who built the Darknet framework and 3 previous YOLO architecture on C-based on Joseph Redmon's theoretical ideas, has cooperated with Chien Yao and Hon-Yuan and published YOLOv4. (Bochkovskiy, 2020) ¶
The common point of all object detection architectures is that the input image features will be compressed down through feature extractor (Backbone) and then forwarding to object detector (including Detection Neck and Detection Head) as in Figure 15. ¶
Detection Neck (or Neck) works as a feature aggregation which is tasked to mix and combine the features formed in the Backbone to prepare for the detection step in Detection Head (or Head). ¶
The difference appearing here that Head is responsible for making detection including localization and classification for each bounding box. The two-stage detector implements these 2 tasks separately and combines their results later (Sparse Detection), whereas the one-stage detector implements it at the same time (Dense Detection) as in Figure 15 (Solawetz, 2020). YOLO is a one-stage detector, therefore, You Only Look Once. ¶
The YOLOv4 author performed a series of experiments with many of the most advanced innovation ideas of computer vision for each part of the architecture (Figure 16). (Bochkovskiy, et al., 2020) ¶
! ¶
Keras imlementation ¶
Yolov5 by pure tensorflow2¶
yaml file to configure the model¶
custom data training¶
mosaic data augmentation¶
label encoding by iou or wh ratio of anchor¶
positive sample augment¶
multi-gpu training¶
detailed code comments¶
full of drawbacks with huge space to improve¶
from¶
https://github.com/LongxingTan/Yolov5¶
NB : The code of this lib will be updated to run under jlab and on our custom data set, it's stored under kyolo5 dir under root dir of jupyter lab ¶
So Let's Implement traning on our dataset ¶
In [2]:
params={}
params['log_dir']='kyolov5/logs'
params['train_annotations_dir']='data/Road_Sign_Dataset/labels/train'
params['test_annotations_dir',]='data/Road_Sign_Dataset/labels/test'
params['val_annotations_dir',]='data/Road_Sign_Dataset/labels/val'
params['class_name_dir']='kyolov5/data/voc2012/VOCdevkit/VOC2012/voc2012.names'
params['yaml_dir']='kyolov5/yolo-m-mish.yaml'
params['checkpoint_dir']='kyolov5/weights'
params['saved_model_dir']='kyolov5/weights/yolov5'
params['n_epochs']=100
params['batch_size']=8
params['multi_gpus']=False
params['init_learning_rate']=3e-4
params['warmup_learning_rate']=1e-6
params['warmup_epochs']=2
params['img_size']=640
params['mosaic_data']=False
params['augment_data']=True
params['anchor_assign_method']='wh'
params['anchor_positive_augment']=True
params['label_smoothing']=0.02
In [3]:
log_writer = tf.summary.create_file_writer(params['log_dir'])
global_step = tf.Variable(0, trainable=False, dtype=tf.int64)
In [4]:
# Get the annotations
train_annotations = [os.path.join(params['train_annotations_dir'], x) for x in os.listdir(params['train_annotations_dir']) if x[-3:] == "txt"]
print("found :", len(train_annotations), " example :", train_annotations[0])
if 'train' in params['train_annotations_dir']:
print("OK")
tt=train_annotations[0].replace("labels","images")
tt=tt[:-3]
tt=tt+"png"
print(tt)
found : 701 example : data/Road_Sign_Dataset/labels/train/road260.txt OK data/Road_Sign_Dataset/images/train/road260.png
In [5]:
annotation="data.img.jpg 0,0,10,10,0 5,5,15,15,1"
example=annotation.split()
a=[list(map(float, box.split(',')[0: 5])) for box in example[1:]]
print(a)
label = np.array([list(map(float, box.split(',')[0: 5])) for box in example[1:]])
print(label)
[[0.0, 0.0, 10.0, 10.0, 0.0], [5.0, 5.0, 15.0, 15.0, 1.0]] [[ 0. 0. 10. 10. 0.] [ 5. 5. 15. 15. 1.]]
In [26]:
#check how to update load lables - in trtainer.py, labels must be ndarray of 2 len shape
print(tt)
label = np.loadtxt(train_annotations[0])
print("label 1",label,len(label.shape))
label=label[np.newaxis,:]
print("after label 1",label,len(label.shape))
#road826.txt
label = np.loadtxt('data/Road_Sign_Dataset/labels/train/road826.txt')
print("label 2", label,len(label.shape))
label=label[:,[1,2,3,4,0]]
print(label)
data/Road_Sign_Dataset/images/train/road260.png label 1 [2. 0.54 0.414 0.227 0.188] 1 after label 1 [[2. 0.54 0.414 0.227 0.188]] 2 label 2 [[3. 0.6 0.441 0.053 0.062] [0. 0.552 0.448 0.057 0.11 ] [0. 0.498 0.475 0.037 0.07 ] [0. 0.308 0.598 0.037 0.04 ] [0. 0.203 0.604 0.013 0.018] [0. 0.237 0.557 0.013 0.02 ] [3. 0.745 0.589 0.03 0.022] [3. 0.432 0.598 0.017 0.015]] 2 [[0.6 0.441 0.053 0.062 3. ] [0.552 0.448 0.057 0.11 0. ] [0.498 0.475 0.037 0.07 0. ] [0.308 0.598 0.037 0.04 0. ] [0.203 0.604 0.013 0.018 0. ] [0.237 0.557 0.013 0.02 0. ] [0.745 0.589 0.03 0.022 3. ] [0.432 0.598 0.017 0.015 3. ]]
In [7]:
trainer = Trainer(params)
In [8]:
dataReader = DataReader(params['train_annotations_dir'],
img_size=params['img_size'],
transforms=transforms,
mosaic=params['mosaic_data'],
augment=params['augment_data'],
filter_idx=None)
Load examples : 701
In [9]:
data_loader = DataLoader(dataReader,
trainer.anchors,
trainer.stride,
params['img_size'],
params['anchor_assign_method'],
params['anchor_positive_augment'])
In [10]:
train_dataset = data_loader(batch_size=8, anchor_label=True)
train_dataset.len = len(dataReader)
In [11]:
x=train_dataset.take(1)
print(x)
<TakeDataset element_spec=(TensorSpec(shape=(None, 640, 640, 3), dtype=tf.float32, name=None), (TensorSpec(shape=(None, 80, 80, 3, 6), dtype=tf.float32, name=None), TensorSpec(shape=(None, 40, 40, 3, 6), dtype=tf.float32, name=None), TensorSpec(shape=(None, 20, 20, 3, 6), dtype=tf.float32, name=None)))>
In [12]:
trainer.train(train_dataset)
=> Epoch 100, Step 8800, Loss 183.78569
2022-06-16 09:36:48.132575: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. WARNING:absl:Found untraced functions such as conv_layer_call_fn, conv_layer_call_and_return_conditional_losses, conv2d_1_layer_call_fn, conv2d_1_layer_call_and_return_conditional_losses, conv_2_layer_call_fn while saving (showing 5 of 322). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: kyolov5/weights/yolov5/assets
INFO:tensorflow:Assets written to: kyolov5/weights/yolov5/assets
pb model saved in kyolov5/weights/yolov5
Check detection ¶
In [13]:
from kyolov5.image_utils import resize_image, resize_back
from kyolov5.vis_data import draw_box
from kyolov5.post_process import batch_non_max_suppression
In [14]:
def image_demo(img, model, img_size=640, class_names=None, conf_threshold=0.4, iou_threshold=0.3):
original_shape = img.shape
img_input = resize_image(img, target_sizes=img_size)
img_input = img_input[np.newaxis, ...].astype(np.float32)
img_input = img_input / 255.
pred_bbox = model(img_input)
pred_bbox = [tf.reshape(x, (tf.shape(x)[0], -1, tf.shape(x)[-1])) for x in pred_bbox]
pred_bbox = tf.concat(pred_bbox, axis=1) # batch_size * -1 * (num_class + 5)
bboxes = batch_non_max_suppression(pred_bbox, conf_threshold=conf_threshold, iou_threshold=iou_threshold)
bboxes = bboxes[0].numpy() # batch is 1 for detect
bboxes = resize_back(bboxes, target_sizes=img_size, original_shape=original_shape) # adjust box to original size
if bboxes.any():
image = draw_box(img, np.array(bboxes), class_names)
cv2.imwrite('./demo.jpg', cv2.cvtColor(image, cv2.COLOR_RGB2BGR))
else:
print('No box detected')
In [15]:
def test_image_demo(img_dir, model_dir, img_size=640, class_name_dir=None, conf_threshold=0.4, iou_threshold=0.3):
img = cv2.imread(img_dir)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
plt.title('Matplotlib') #Give this plot a title,
#so I know it's from matplotlib and not cv2
plt.show()
if class_name_dir:
class_names = {idx: name for idx, name in enumerate(open(class_name_dir).read().splitlines())}
else:
class_names = None
model = tf.saved_model.load(model_dir)
image_demo(img, model, img_size=img_size, class_names=class_names,
conf_threshold=conf_threshold, iou_threshold=iou_threshold)
In [20]:
import cv2
test_image_demo('data/Road_Sign_Dataset/images/train/road556.png','kyolov5/weights/yolov5',conf_threshold=0.1, iou_threshold=0.1)
In [21]:
img = cv2.imread('demo.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
plt.title('Res') #Give this plot a title,
#so I know it's from matplotlib and not cv2
plt.show()
Training Options¶
Now, we train the network. We use various flags to set options regarding training. ¶
img : Size of image. The image is a square one. The original image is resized while maintaining the aspect ratio. The longer side of the image is resized to this number. The shorter side is padded with grey color. ¶
batch: The batch size
epochs: Number of epochs to train for
data: Data YAML file that contains information about the dataset (path of images, labels)
workers: Number of CPU workers
cfg: Model architecture. There are 4 choices available: yolo5s.yaml, yolov5m.yaml, yolov5l.yaml, yolov5x.yaml. The size and complexity of these models increases in the ascending order and you can choose a model which suits the complexity of your object detection task. In case you want to work with a custom architecture, you will have to define a YAML file in the models folder specifying the network
architecture. ¶
weights: Pretrained weights you want to start training from. If you want to train from scratch, use --weights ' '
name: Various things about training such as train logs. Training weights would be stored in a folder named runs/train/name
hyp: YAML file that describes hyperparameter choices. For examples of how to define hyperparameters, see data/hyp.scratch.yaml. If unspecified, the file data/hyp.scratch.yaml is used.
Data Config File
Details for the dataset you want to train your model on are defined by the data config YAML file. The following parameters have to be defined in a data config file:
train, test, and val: Locations of train, test, and validation images.
nc: Number of classes in the dataset.
names: Names of the classes in the dataset. The index of the classes in this list would be used as an identifier for the class names in the code.
Create a new file called road_sign_data.yaml and place it in the yolov5/data folder. Then populate it with the following.
Links ¶
YOLOv4: Optimal Speed and Accuracy of Object Detection (paper)¶
https://arxiv.org/abs/2004.10934v1
from yolo v1 to v5 pdf¶
https://www.theseus.fi/bitstream/handle/10024/452552/Do_Thuan.pdf?sequence=2&isAllowed=y