| MatArray Toolbox | Search  Help Desk |
| kmeans | Examples See Also |
Syntax
[mship, proto, er] = kmeans(M, nc) [mship, proto, er] = kmeans(M, nc, beg) [mship, proto, er] = kmeans(M, nc, beg, options)
Description
Thekmeans function performs a K-means clustering
of the columns of the matrix M in nc
clusters. The result consists of two parts: the
nc clustering prototypes proto and
the cluster memberships mship. The K-means
error, that is the sum of the square distances
between each column of M and its cluster prototype,
is stored in er.
This is the criterion the K-means algorithm tries to minimize.
beg is the starting point for the algorithm.
If it is a line vector, it is taken as the starting
mship, otherwise it is taken as the starting
proto. In the case beg is not given,
or its length is zero,
a random mship is drawn and the algorithm starts
from there.
If there is a fourth argument, a slightly modified version of the algorithm is used to try to improve the solution (see Reference). The modification is cheap and cannot hurt, but since it is not classical it is given as an option (but we believe you should use it).
In the original version, the function uses dl2c
to calculate the distances between the items and the prototypes.
In the case there are missing values, you should replace
dl2c by dl2cNaN and mean
by nanmean in the function code.
Examples
M=randn(10,10); [mship, proto, er] = kmeans(M, 3); er er = 54.0479Re-clustering starting from this result
[mship, proto, er] = kmeans(M, 3, mship); er er = 54.0479gives the same error, as expected. However, using the fancy algorithm
[mship, proto, er] = kmeans(M, 3, mship, 1); er er = 48.3452the result is significantly improved. This is not systematic, but the error cannot raise and the time penality is small.
See Also
dl2c,
dl2cNaN,
hierarc
References