Distance Metrics for Machine Learning
Thu Dec 25 2025
Methods of measuring distance between quantities in Statistical Learning is a foundation and effective instrument in prominent Machine Learning Algorithms like k-means clustering and A-star algorithm.
Prominent Distance Metrics are as follows :
- Hamming Distance
- Euclidean Distance
- Manhattan Distance
- Minkowski Distance
Hamming Distance
Hamming Distance measures how many positions two equal-length strings or vectors differ in. In other words it counts how many symbols are different.
It is popular with binary vectors with categorical columns of data. You are likely to enconter binary vectors with catergorical columns of data when the data is transformed using one-hot encoding/dummy coding/indicator function.
Formal Definition :
For two strings and of equal length, the Hamming distance is defined as:
It adds every time the symbols at position are different.
Here's a Python implementation of Hamming Distance :
# Calculate Hamming Distance between two binary vectors
def hamming_distance(a, b):
"""
Computes the Hamming Distance between two equal-length vectors
a and b.
Returns the number of differing positions.
"""
if len(a) != len(b):
raise ValueError("Vectors must be of equal length")
return sum(abs(e1 - e2) for e1, e2 in zip(a, b))
# Example usage
row1 = [0, 0, 0, 0, 0, 1]
row2 = [0, 0, 0, 0, 1, 0]
dist = hamming_distance(row1, row2)
print("Hamming Distance:", dist) # Output: 2