# Siamese neural network source: en.wikipedia.org/wiki/Siamese_neural_network

A **Siamese neural network** (sometimes called a **twin neural network**) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.^{[1]}^{[2]}^{[3]} Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.^{[citation needed]}

It is possible to make a kind of structure that is functional similar to a siamese network, but implements a slightly different function. This is typically used for comparing similar instances in different type sets.^{[citation needed]}

Uses of similarity measures where a twin network might be used are such things as recognizing handwritten checks, automatic detection of faces in camera images, and matching queries with indexed documents. The perhaps most well-known application of twin networks are face recognition, where known images of people are precomputed and compared to an image from a turnstile or similar. It is not obvious at first, but there are two slightly different problems. One is recognizing a person among a large number of other persons, that is the facial recognition problem. DeepFace is an example of such a system.^{[3]} In its most extreme form this is recognizing a single person at a train station or airport. The other is face verification, that is to verify whether the photo in a pass is the same as the person claiming he or she is the same person. The twin network might be the same, but the implementation can be quite different.

## Learning[edit]

Learning in twin networks can be done with triplet loss or contrastive loss. For learning by triplet loss a baseline vector (anchor image) is compared against a positive vector (truthy image) and a negative vector (falsy image). The negative vector will force learning in the network, while the positive vector will act like a regularizer. For learning by contrastive loss there must be a weight decay to regularize the weights, or some similar operation like a normalization.

A distance metric for a loss function must have the following properties^{[4]}

- Non-negativity:
- Identity of Discernible:
- Symmetry:
- Triangle inequality:

In particular, the triplet loss algorithm is often defined with squared Euclidean distance at its core.

### Predefined metrics, Euclidean distance metric[edit]

The common learning goal is to minimize a distance metric for similar objects and maximize for distinct ones. This gives a loss function like

- are indexes into a set of vectors
- function implemented by the twin network

The most common distance metric used is Euclidean distance, in case of which the loss function can be rewritten in matrix form as

### Learned metrics, nonlinear distance metric[edit]

A more general case is where the output vector from the twin network is passed through additional network layers implementing non-linear distance metrics.

- are indexes into a set of vectors
- function implemented by the twin network
- function implemented by the network joining outputs from the twin network

On a matrix form the previous is often approximated as a Mahalanobis distance for a linear space as^{[5]}

This can be further subdivided in at least Unsupervised learning and Supervised learning.

### Learned metrics, half-twin networks[edit]

This form also allows the twin network to be more of a half-twin, implementing a slightly different functions

- are indexes into a set of vectors
- function implemented by the half-twin network
- function implemented by the network joining outputs from the twin network

## Twin Networks for Object Tracking[edit]

Twin networks have been used in object tracking because of its unique two tandem inputs and similarity measurement. In object tracking, one input of the twin network is user pre-selected exemplar image, the other input is a larger search image, which twin network's job is to locate exemplar inside of search image. By measuring the similarity between exemplar and each part of the search image, a map of similarity score can be given by the twin network. Furthermore, using a Fully Convolutional Network, the process of computing each sector's similarity score can be replaced with only one cross correlation layer.^{[6]}

After being first introduced in 2016, Twin fully convolutional network has been used in many High-performance Real-time Object Tracking Neural Networks. Like CFnet,^{[7]} StructSiam,^{[8]} SiamFC-tri,^{[9]} DSiam,^{[10]} SA-Siam,^{[11]} SiamRPN,^{[12]} DaSiamRPN,^{[13]} Cascaded SiamRPN,^{[14]} SiamMask,^{[15]} SiamRPN++,^{[16]} Deeper and Wider SiamRPN.^{[17]}

## See Also[edit]

## References[edit]

**^**Bromley, Jane; Guyon, Isabelle; LeCun, Yann; Säckinger, Eduard; Shah, Roopak (1994). "Signature verification using a "Siamese" time delay neural network" (PDF).*Advances in Neural Information Processing Systems 6*: 737–744.**^**Chopra, S.; Hadsell, R.; LeCun, Y. (June 2005). "Learning a similarity metric discriminatively, with application to face verification".*2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)*.**1**: 539–546 vol. 1. doi:10.1109/CVPR.2005.202. ISBN 0-7695-2372-2.- ^
^{a}^{b}Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. (June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification".*2014 IEEE Conference on Computer Vision and Pattern Recognition*: 1701–1708. doi:10.1109/CVPR.2014.220. ISBN 978-1-4799-5118-5. **^**Chatterjee, Moitreya; Luo, Yunan. "Similarity Learning with (or without) Convolutional Neural Network" (PDF). Retrieved 2018-12-07.**^**Chandra, M.P. (1936). "On the generalized distance in statistics" (PDF).*Proceedings of the National Institute of Sciences of India*. 1.**2**: 49–55.**^**Fully-Convolutional Siamese Networks for Object Tracking arXiv:1606.09549**^**"End-to-end representation learning for Correlation Filter based tracking".**^**"Structured Siamese Network for Real-Time Visual Tracking" (PDF).**^**"Triplet Loss in Siamese Network for Object Tracking" (PDF).**^**"Learning Dynamic Siamese Network for Visual Object Tracking" (PDF).**^**"A Twofold Siamese Network for Real-Time Object Tracking" (PDF).**^**"High Performance Visual Tracking with Siamese Region Proposal Network" (PDF).**^**Zhu, Zheng; Wang, Qiang; Li, Bo; Wu, Wei; Yan, Junjie; Hu, Weiming (2018). "Distractor-aware Siamese Networks for Visual Object Tracking". arXiv:1808.06048 [cs.CV].**^**Fan, Heng; Ling, Haibin (2018). "Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking". arXiv:1812.06148 [cs.CV].**^**Wang, Qiang; Zhang, Li; Bertinetto, Luca; Hu, Weiming; Torr, Philip H. S. (2018). "Fast Online Object Tracking and Segmentation: A Unifying Approach". arXiv:1812.05050 [cs.CV].**^**Li, Bo; Wu, Wei; Wang, Qiang; Zhang, Fangyi; Xing, Junliang; Yan, Junjie (2018). "SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks". arXiv:1812.11703 [cs.CV].**^**Zhang, Zhipeng; Peng, Houwen (2019). "Deeper and Wider Siamese Networks for Real-Time Visual Tracking". arXiv:1901.01660 [cs.CV].