Distance Metrics for Machine Learning

Thu Dec 25 2025

Methods of measuring distance between quantities in Statistical Learning is a foundation and effective instrument in prominent Machine Learning Algorithms like k-means clustering and A-star algorithm.

Prominent Distance Metrics are as follows :

Hamming Distance
Euclidean Distance
Manhattan Distance
Minkowski Distance

Hamming Distance

Hamming Distance measures how many positions two equal-length strings or vectors differ in. In other words it counts how many symbols are different.

It is popular with binary vectors with categorical columns of data. You are likely to enconter binary vectors with catergorical columns of data when the data is transformed using one-hot encoding/dummy coding/indicator function.

Formal Definition :

For two strings $x$ and $y$ of equal length, the Hamming distance is defined as:

d_H(x, y) = \sum_{i=1}^{n} \mathbf{1}(x_i \neq y_i)

It adds $1$ every time the symbols at position $i$ are different.

Here's a Python implementation of Hamming Distance :

# Calculate Hamming Distance between two binary vectors

def hamming_distance(a, b):
    """
    Computes the Hamming Distance between two equal-length vectors 
    a and b.
    Returns the number of differing positions.
    """
    if len(a) != len(b):
        raise ValueError("Vectors must be of equal length")
    return sum(abs(e1 - e2) for e1, e2 in zip(a, b))

# Example usage
row1 = [0, 0, 0, 0, 0, 1]
row2 = [0, 0, 0, 0, 1, 0]

dist = hamming_distance(row1, row2)
print("Hamming Distance:", dist)  # Output: 2

Distance Metrics for Machine Learning

Hamming Distance

Formal Definition :

Euclidean Distance :