Tech It Yourself

## CNN - Image Resizing VS Padding (keeping aspect ratio or not?)

August 09, 2020

Zero padding or resize According to Jeremy Howard, padding a big piece of the image (64x160 pixels) will have the following effect: The CNN will have to learn that the black part of the image is not relevant and does not help distinguishing between the classes (in a classification setting), as there is no correlation between the pixels in the black part and belonging to a given class. As you are not hard coding this, the CNN will have to learn it by gradient descent, and this might probably take some epochs. For this reason, you can do it if you have lots of images and computational power, but if you are on a budget on any of them, resizing should work better.

But: Let's say that you normalize all the pixels to [0,1]. So the black pixels are all 0s. So that during convolution any kernel will output 0 for those pixels. So...is it super easy to learn,and it's kind of automatic as well.

## Convert FC layers to Fully Conv layers

August 09, 2020
def to_fully_conv(model):
new_model = Sequential()
input_layer = InputLayer(input_shape=(None, None, 3), name="input_new")
index = 0
for layer in model.layers:
layer._name = layer._name + str(index)
index += 1
if "InputLayer" in str(layer):
continue
elif "Flatten" in str(layer):
flattened_ipt = True
f_dim = layer.input_shape
elif "Dense" in str(layer):
input_shape = layer.input_shape
output_dim =  layer.get_weights()[1].shape[0]
W,b = layer.get_weights()
if flattened_ipt:
shape = (f_dim[1],f_dim[2],f_dim[3],output_dim)
new_W = W.reshape(shape)
new_layer = Convolution2D(output_dim,
(f_dim[1],f_dim[2]),
strides=(1,1),
activation=layer.activation,
weights=[new_W,b])
flattened_ipt = False
else:
shape = (1,1,input_shape[1],output_dim)
new_W = W.reshape(shape)
new_layer = Convolution2D(output_dim,
(1,1),
strides=(1,1),
activation=layer.activation,
weights=[new_W,b])
else:
new_layer = layer
return new_model


## new data augmentation methods

August 09, 2020
Existing data augmentation methods can be roughly divided into three categories: spatial transformation, color distortion, and information dropping. Spatial transformation involves a set of basic data augmentation solutions, such as random scale, crop, flip and random rotation, which are widely used in model training. Color distortion, which contains changing brightness, hue, etc. is also used in several models. These two methods aim at transforming the training data to better simulate real-world data, through changing some channels of information.
Information deletion is widely employed recently for its effectiveness and/or efficiency. It includes random erasing, cutout, and hide-and-seek (HaS). It is common knowledge that by deleting a level of information in the image, CNNs can learn originally less sensitive or important information and increase the perception field, resulting in a notable increase of robustness of the model.

1. Random Erasing Data Augmentation
In training, Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. In this process, training images with various levels of occlusion are generated, which reduces the risk of over-fitting and makes the model robust to occlusion. Random Erasing is parameter learning free, easy to implement, and can be integrated with most of the CNN-based recognition models. Random Erasing is complementary to commonly used data augmentation techniques such as random cropping and flipping, and yields consistent improvement over strong baselines in image classification, object detection and person re-identification.
Random Erasing + Random Cropping:
Random cropping is an effective data augmentation approach, it reduces the contribution of the background in the CNN decision, and can base learning models on the presence of parts of the object instead of focusing on the whole object. In comparison to random cropping, Random Erasing retains the overall structure of the object, only occluding some parts of object. In addition, the pixels of erased region are re-assigned with random values, which can be viewed as adding noise to the image. In our experiment, we show that these two methods are complementary to each other for data augmentation.

2. Improved Regularization of Convolutional Neural Networks with Cutout
Due to the CNN model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance.

3. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization
'Hide-and-Seek', a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden. Our approach only needs to modify the input image and can work with any network designed for object localization. During testing, we do not need to hide any patches. Our Hide-and-Seek approach obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset. We also demonstrate that our framework can be easily extended to weakly-supervised action localization. The RGB value v of a hidden pixel to be equal to the mean RGB vector of the images over the entire dataset

We found intriguingly a successful information dropping method should achieve reasonable balance between deletion and reserving of regional information on the images. The reason is twofold intuitively.

Existing information dropping algorithms have different chances of achieving a reasonable balance between deletion and reservation of continuous regions. Both cutout and random erasing delete only one continuous region of the image. The resulting imbalance of these two conditions is obvious because the deleted region is one area. It has a good chance to cover the whole object or none of it depending on size and location. The approach of HaS is to divide the picture evenly into small squares and delete them randomly. It is more effective and still stands a considerable chance for continuously deleting or reserving regions. Some unsuccessful examples of existing methods are shown.

We surprisingly observe the very easy strategy that can balance these two conditions statistically better is by using structured dropping regions, such as deleting uniformly distributed square regions. Our proposed information removal method, named GridMask,

5. MixUp: BEYOND EMPIRICAL RISK MINIMIZATION
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.

6. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Current methods for regional dropout remove informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it leads to information loss and inefficiency during training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout. CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances.

7. Mosaic data augmentation
Mosaic data augmentation combines 4 training images into one in certain ratios (instead of only two in CutMix). Mosaic is the first new data augmentation technique introduced in YOLOv4. This allows for the model to learn how to identify objects at a smaller scale than normal. It also is useful in training to significantly reduce the need for a large mini-batch size.
8. Class label smoothing
Generally, the correct classification for a bounding box is represented as a one hot vector of classes [0,0,0,1,0,0, ...] and the loss function is calculated based on this representation. However, when a model becomes overly sure with a prediction close to 1.0, it is often wrong, overfit, and over looking the complexities of other predictions in some way. Following this intuition, it is more reasonable to encode the class label representation to value that uncertainty to some degree. Naturally, the authors choose 0.9, so [0,0,0,0.9, 0....] to represent the correct class.

https://blog.roboflow.ai/yolov4-data-augmentation/

## Visualize the heatmap - GradCAM - Keras

August 07, 2020

Steps:

1) Compute the model output and last convolutional layer output for the image.

2) Find probability of the winning class.

3) Compute the gradient of the winning class with resepct to the last convolutional layer.

3) Weighted the gradient with the last convolutional layer. And normalize it for visualization

import tensorflow as tf
import tensorflow.keras.backend as K
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.inception_v3 import preprocess_input, decode_predictions
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2

model = InceptionV3(weights='imagenet')
model.summary()
ORIGINAL = 'cat.png'
DIM = 299
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
last_conv_layer = model.get_layer('conv2d_93')
f = K.function([model.inputs], [model.output, last_conv_layer.output, pooled_grads])
class_out = model_out[:, np.argmax(model_out[0])]
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
heatmap = heatmap[0]
INTENSITY = 0.5
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
heatmap = cv2.applyColorMap(np.uint8(255*heatmap), cv2.COLORMAP_JET)
img = heatmap * INTENSITY + img
cv2.imwrite('output.jpg', img)
Figure: Input and Output

## virtual gpu with tensorflow

June 09, 2020
import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "No GPUs found"

tf.config.experimental.set_virtual_device_configuration(
physical_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100)])

try:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
print('Cannot set memory growth when virtual devices configured')

try:
tf.config.experimental.set_virtual_device_configuration(
physical_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100)])
except:
print('Cannot modify the virtual devices once they have been initialized.')

logical_devices = tf.config.experimental.list_logical_devices('GPU')
print('------------------------------------------------')
print(logical_devices)
print('------------------------------------------------')

with tf.device(logical_devices[0].name):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True))
c = tf.matmul(a, b)
print (sess.run(c))

with tf.device(logical_devices[1].name):

x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='x')
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='y')
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True))
z = tf.matmul(x, y)
print (sess.run(z))


## Face Recognition

May 04, 2020
Two main modes for face recognition:
- Face Verification (or authentication): a one-to-one mapping of a given face against a known identity.
- Face Identification (or recognition): a one-to-many mapping for a given face against a database of known faces.
Applications:
- Restrict access to a resource to one person, called face authentication.
- Confirm that the person matches their ID, called face verification.
- Assign a name to a face, called face identification.
Traditional Face Recognition steps or may combine some or all of the steps into a single process:
- Face Detection: locate one or more faces in the image and mark with a bounding box.
- Face Alignment: normalize the face to be consistent with the database, such as geometry and photo-metrics.
- Feature Extraction: extract features from the face that can be used for the recognition task.
- Face Recognition: perform matching of the face against one or more known faces in a prepared database.

Experiment with FaceNet
FaceNet directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity:
faces of the same person have small distances and faces of distinct people have large distances.
FaceNet combine some of the steps into a single process.

Once this embedding (FaceNet embeddings as feature vectors) has been produced, then face verification simply involves thresholding the distance between the two embeddings; recognition becomes a k-NN classification problem.

FaceNet directly trainsits output to be a compact 128-D embedding (128-bytes per face) using a triplet-based loss function. Our triplets consist of two matching face thumbnails and a non-matching face thumbnail and the loss aims to separate the positive pair from the negative by a distance margin. The thumbnails are tight crops of the face area, no 2D or 3D alignment, but scale and translation is performed.

Demo
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from skimage.transform import resize
from mtcnn import MTCNN

image_dir_basepath = '5-celebrity-faces-dataset/'
names = ['ben_afflek', 'elton_john', 'jerry_seinfeld', 'madonna', 'mindy_kaling']
FACENET_SIZE = (160, 160)

model_path = 'facenet_keras.h5'
detector = MTCNN()

def l2_normalize(x, axis=-1, epsilon=1e-10):
output = x / np.sqrt(np.maximum(np.sum(np.square(x), axis=axis, keepdims=True), epsilon))
return output

aligned_images = []
for filepath in filepaths:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
faces = detector.detect_faces(img)
x, y, w, h = faces[0]['box']
x, y = abs(x), abs(y)
face = img[y - margin//2 : y + h + margin//2, x-margin//2 : x + w + margin//2, :]
aligned = cv2.resize(face, FACENET_SIZE)/255.0
aligned_images.append(aligned)
return np.array(aligned_images)

def calc_embs(filepaths, margin=10, batch_size=1):
pd = []
for start in range(0, len(aligned_images), batch_size):
pd.append(model.predict_on_batch(aligned_images[start:start+batch_size]))
embs = l2_normalize(np.concatenate(pd))

return embs

def train(dir_basepath, names, max_num_img=10):
labels = []
embs = []
for name in names:
dirpath = os.path.abspath(dir_basepath + name)
filepaths = [os.path.join(dirpath, f) for f in os.listdir(dirpath)][:max_num_img]
embs_ = calc_embs(filepaths)
labels.extend([name] * len(embs_))
embs.append(embs_)

embs = np.concatenate(embs)
le = LabelEncoder().fit(labels)
y = le.transform(labels)
clf = SVC(kernel='linear', probability=True).fit(embs, y)
return le, clf

def infer(le, clf, filepaths):
embs = calc_embs(filepaths)
pred = le.inverse_transform(clf.predict(embs))
return pred

le, clf = train(image_dir_basepath + 'train/', names)
test_dirpath = image_dir_basepath + 'val/'
test_filepaths = []
for name in names:
for f in os.listdir(test_dirpath + name):
test_filepaths.append(test_dirpath + name + '/' + f)

pred = infer(le, clf, test_filepaths)
print(test_filepaths)
for i in range(len(pred)):
print(test_filepaths[i])
print(pred[i])
print('---------')


## Create Virtual GPUs from single GPU

April 29, 2020

import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "No GPUs found"

tf.config.experimental.set_virtual_device_configuration(
physical_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100)])

try:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
print('Cannot set memory growth when virtual devices configured')

try:
tf.config.experimental.set_virtual_device_configuration(
physical_devices[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100),
tf.config.experimental.VirtualDeviceConfiguration(memory_limit=100)])
except:
print('Cannot modify the virtual devices once they have been initialized.')

logical_devices = tf.config.experimental.list_logical_devices('GPU')
print('------------------------------------------------')
print(logical_devices)
print('------------------------------------------------')

with tf.device(logical_devices[0].name):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True))
c = tf.matmul(a, b)
print (sess.run(c))

with tf.device(logical_devices[1].name):

x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='x')
y = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='y')
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True))
z = tf.matmul(x, y)
print (sess.run(z))


## Homogeneous Coordinates

November 10, 2019

### Problem: Two parallel lines can intersect.

Railroad gets narrower and meets at horizon.
In Euclidean space (geometry), two parallel lines on the same plane cannot intersect, or cannot meet each other forever. It is a common sense that everyone is familiar with.

However, it is not true any more in projective space, for example, the train railroad on the side picture becomes narrower while it moves far away from eyes. Finally, the two parallel rails meet at the horizon, which is a point at infinity.

Euclidean space (or Cartesian space) describe our 2D/3D geometry so well, but they are not sufficient to handle the projective space (Actually, Euclidean geometry is a subset of projective geometry). The Cartesian coordinates of a 2D point can be expressed as (x, y).

What if this point goes far away to infinity? The point at infinity would be (∞,∞), and it becomes meaningless in Euclidean space. The parallel lines should meet at infinity in projective space, but cannot do in Euclidean space. Mathematicians have discoverd a way to solve this issue.

### Solution: Homogeneous Coordinates

Homogeneous coordinates, introduced by August Ferdinand Möbius, make calculations of graphics and geometry possible in projective space. Homogeneous coordinates are a way of representing N-dimensional coordinates with N+1 numbers.

To make 2D Homogeneous coordinates, we simply add an additional variable, w, into existing coordinates. Therefore, a point in Cartesian coordinates, (X, Y) becomes (x, y, w) in Homogeneous coordinates. And X and Y in Cartesian are re-expressed with x, y and w in Homogeneous as;
X = x/w
Y = y/w

For instance, a point in Cartesian (1, 2) becomes (1, 2, 1) in Homogeneous. If a point, (1, 2), moves toward infinity, it becomes (∞,∞) in Cartesian coordinates. And it becomes (1, 2, 0) in Homogeneous coordinates, because of (1/0, 2/0) ≈ (∞,∞). Notice that we can express the point at infinity without using "∞".

### Why is it called "homogeneous"?

As mentioned before, in order to convert from Homogeneous coordinates (x, y, w) to Cartesian coordinates, we simply divide x and y by w;

Converting Homogeneous to Cartesian, we can find an important fact. Let's see the following example;

As you can see, the points (1, 2, 3), (2, 4, 6) and (4, 8, 12) correspond to the same Euclidean point (1/3, 2/3). And any scalar product, (1a, 2a, 3a) is the same point as (1/3, 2/3) in Euclidean space. Therefore, these points are "homogeneous" because they represent the same point in Euclidean space (or Cartesian space). In other words, Homogeneous coordinates are scale invariant.
You can imagine the w value as the distance between TV projector and screen. While w increases, the screen moves away from the projector.

Copy here: http://www.songho.ca/math/homogeneous/homogeneous.html