K-MEDOIDS Clustering essay

K-MEDOIDS Clustering essay

The volumes of information available in the modern world are increasing, and simultaneously with this the importance of knowledge discovery in databases increases. There emerge numerous approaches and algorithms allowing gaining new knowledge from existing data sets. An important step of the process of knowledge discovery is data mining – the process of discovering patterns in large volumes of data using complex algorithms based on theory and achievements of artificial intelligence, machine learning, statistics, etc (Theodoridis & Koutroumbas, 2008). Using techniques of data mining, researchers can develop new knowledge and find meaningful relationship patterns.

A powerful technique of data mining is clustering. Clustering is the process of unsupervised pattern identification and classification of data into separate clusters (groups) according to different metrics (Espinoza, 2012). Clustering is similar to classification in the sense that it identified similar objects, and is different from classification because it helps to identify new data groups and new approaches to classifying and studying these data. Clustering plays an important role in research because it provides the basis for making hypotheses, identifying relationships between different variables and making inferences about the factors shaping the clusters.

Clustering is of particular value in healthcare research where it can be used for identifying risk groups, for finding the causes of the disease and cures and for providing a more efficient service to the patients (Milley, 2000). The purpose of this research is to develop a program for determining clusters in the data set containing the data on the cases from a study conducted at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer (UCI, 1999), to compare the performance of this program and one of the existing data mining programs, to analyze the patterns and consider the usefulness of both options, and to make conclusions with regard to future research in this area.