Saturday, July 26, 2008

SAS Tips 1001 - Tip 12 - Calculating Mahalanobis distance - Not so commonly known

This material is posted here from SAS.

http://support.sas.com/kb/30/662.html

This sample shows one way of computing Mahalanobis distance in each of the following scenarios:
from each observation to the mean
from each observation to a specific observation
from each observation to all other observations (all possible pairs) 1) To compute the Mahalanobis distance from each observation to the mean, first run PROC PRINCOMP with the STD option to produce principal component scores in the OUT= data set having an identity covariance matrix. The Mahalanobis distance and Euclidean distances are equivalent for these scores. Then use a DATA step with a statement such as:
mahalanobis_distance_to_mean = sqrt(uss(of prin:));
to complete the required distance.
2) To compute the Mahalanobis distance from each observation to a specific point, compute the principal component score for that point using the original scoring coefficients. Then compute the Euclidean distance from each observation to the reference point. One easy way to do this is to use PROC FASTCLUS treating the reference point as the SEED.
3) To compute Mahalanobis distances between all possible pairs, run PROC DISTANCE on the OUT= data set as created by PRINCOMP in the steps above. PROC DISTANCE will automatically calculate all possible pairs.

No comments: