Clustering
The following table compares different algorithms:

Silhouette Coefficient

In the following definition \(i\) and \(j\) are data points and \(C_I\) and \(C_J\) are the respective clusters of the data points \(i\) and \(j\). \(d(i, j)\) is the distance between \(i\) and \(j\). $$ \begin{align} a(i)&=\frac 1 {|C_I|-1} \sum_{j\in C_J,i\neq j} d(i,j)\ b(i)&=\min_{J\neq I} \frac 1 {|C_J|} \sum_{j\in C_J} d(i, j)\ s(i)&=\frac{b(i) - a(i)}{\max(a(i), b(i))} &\text{if } |C_I| > 1\ s(i)&=0 & \text{if } |C_I| = 1 \end{align} $$ \(a(i)\) is the average distance to all points in the same clusters as \(i\), while \(b(i)\) is the average distance to all points of the closest neighbouring cluster.
The silhouette coefficient can be plotted into a following diagram. The red line is the overall silhouette coefficient average.

If the silhouette coefficient is less then zero then this point might be wrongly classified.

In the following graphs \(k=3\) or \(k=4\) is best, since they have a height average silhouette coefficient. \(k=4\) has more equally sized clusters which can be a benefit.
