What is a distance?
take any two texts:
the and to of a was I in he said you
lewis_lion 5.141 3.699 2.295 2.185 2.100 1.346 0.813 1.162 1.087 1.426 1.141
tolkien_lord1 5.624 3.782 2.074 2.597 1.916 1.313 1.492 1.419 1.221 0.825 0.872
subtract the values vertically:
the and to of a was I in he said you
-0.483 -0.083 0.221 -0.412 0.184 0.033 -0.679 -0.257 -0.134 0.601 0.269
then drop the minuses:
the and to of a was I in he said you
0.483 0.083 0.221 0.412 0.184 0.033 0.679 0.257 0.134 0.601 0.269
sum up the obtained values:
Manhattan vs. Euclidean
![]()
Euclidean distance
between any two texts represented by two points A and B in an n-dimensional space can be defined as:
\[ \delta_{AB} = \sqrt{ \sum_{i = 1}^{n} (A_i - B_i)^2 } \]
where A and B are the two documents to be compared, and \(A_i,\, B_i\) are the scaled (z-scored) frequencies of the i-th word in the range of n most frequent words.
Manhattan distance
can be formalized as follows:
\[ \delta_{AB} = \sum_{i = 1}^{n} | A_i - B_i | \]
which is equivalent to
\[ \delta_{AB} = \sqrt[1]{ \sum_{i = 1}^{n} | A_i - B_i |^1 } \]
(the above weird notation will soon become useful)
Euclidean and Manhattan are siblings!
\[ \delta_{AB} = \sqrt[2]{ \sum_{i = 1}^{n} (A_i - B_i)^2 } \]
vs.
\[ \delta_{AB} = \sqrt[1]{ \sum_{i = 1}^{n} | A_i - B_i |^1 } \]
For that reason, Manhattan and Euclidean are named L1 and L2, respectively.
An (infinite) family of distances
- The above observations can be further generalized
- Both Manhattan and Euclidean belong to a family of (possible) distances:
\[ \delta_{AB} = \sqrt[p]{ \sum_{i = 1}^{n} | A_i - B_i |^p } \]
where p is both the power and the degree of the root.
The norms L1, L2, L3, … (and beyond)
- The power p doesn’t need to be a natural number
- We can easily imagine norms such as L1.01, L3.14159, L1¾, L\(\sqrt{2}\) etc.
- Mathematically, \(p < 1\) doesn’t satisfy the formal definition of a norm…
- … yet still, one can easily imagine a dissimilarity L0.5 or L0.0001.
- (plus, the so-called Cosine Distance doesn’t satisfy the definition either).
To summarize…
- The p parameter is a continuum
- Both \(p = 1\) and \(p = 2\) (for Manhattan and Euclidean, respectively) are but two specific points in this continuous space
- p is a method’s hyperparameter to be set or possibly tuned