Computer Science & Life

An algorithm to find the value of k in k-means clustering algorithm

Introduction

This article aims to present an algorithm which helps you to sort out the problem of determining the value of k of K-means algorithm.

[edit]The Algorithm

Lets say we have 'n' points with us. Make a 2-D matrix of n*n dimension, i.e. M[n][n]. Its a distance matrix , which keeps the distance of one point to all other points.

For example:- Lets we have 10 points such as- A(2,6), B(3,4), C(3,8), D(4,7), E(6,2), F(6,4), G(7,3), H(7,4), I(8,5), J(7,6).
The M[10][10] matrix would be looking something like this-

	Point A(2,6)	Point B(3,4)	Point C(3,8)	Point D(4,7)	Point E(6,2)	Point F(6,4)	Point G(7,3)	Point H(7,4)	Point I(8,5)	Point J(7,6)
Point A(2,6)	-	2.23	2.23	2.23	5.64	5.64	5.83	5.38	6.02	5.38
Point B(3,4)	2.23	-	4	3.16	3.61	3	4.12	4	5.1	4.47
Point C(3,8)	2.23	4	-	1.41	6.78	5	6.4	5.65	5.91	4.47
Point D(4,7)	2.23	3.16	1.41	-	4.47	3.6	5	4.24	4.47	3.16
Point E(6,2)	5.64	3.61	6.78	4.47	-	2	1.41	2.23	3.6	4.12
Point F(6,4)	5.64	3	5	3.6	2	-	1.41	1	2.23	2.23
Point G(7,3)	5.83	4.12	6.4	5	1.41	1.41	-	1	2.23	3
Point H(7,4)	5.38	4	5.65	4.24	2.23	1	1	-	1.41	2
Point I(8,5)	6.02	5.1	5.91	4.47	3.6	2.23	2.23	1.41	-	1.41
Point J(7,6)	5.38	4.47	4.47	3.16	4.12	2.23	3	2	1.41	-

The entry of this matrix is filled by computing the distance between two points. for example: The distance between point(2,6) and point(3,4) can be calculated as sqrt((2-3)²+(6-4)²)

On the basis of this matrix, build another matrix namely L[n][n], here the value of n is 10, of course.

Matrix L[10][10] looks something like this-

	Point A(2,6)	Point B(3,4)	Point C(3,8)	Point D(4,7)	Point E(6,2)	Point F(6,4)	Point G(7,3)	Point H(7,4)	Point I(8,5)	Point J(7,6)
Point A(2,6)	-	1	0	0	0	0	0	0	0	0
Point B(3,4)	1	-	0	0	0	0	0	0	0	0
Point C(3,8)	1	0	-	0	0	0	0	0	0	0
Point D(4,7)	0	0	1	-	0	0	0	0	0	0
Point E(6,2)	0	0	0	0	-	0	1	0	0	0
Point F(6,4)	0	0	0	0	0	-	0	1	0	0
Point G(7,3)	0	0	0	0	0	0	-	1	0	0
Point H(7,4)	0	0	0	0	0	1	0	-	0	0
Point I(8,5)	0	0	0	0	0	0	0	1	-	0
Point J(7,6)	0	0	0	0	0	0	0	0	1	-

On the basis of this matrix, value of k can be determined easily. For instance, in this example points B and C can be clustered together. Similarly, points F,G ans I can be clustered in one cluster. point A can be clustered alone. point D can be clustered alone. point E can be clustered alone point H can be clustered alone point J can be clustered alone

Hence the value of K is 7.

[edit]Future Scope

Value of k can be further optimised as follows-
In the example illustrated above point A is closer to point B. Hence point A can be clustered into B and C. And so as the case with G. Forwarding in this way, we can obtained an optimised value of K.

Computer Science & Life

Search This Blog

An algorithm to find the value of k in k-means clustering algorithm

Introduction

[edit]The Algorithm

[edit]Future Scope

Comments

Post a Comment